In Pandas, reshaping data refers to the process of converting a DataFrame from one format to another for better data visualization and analysis.
Pandas provides multiple methods like pivot(), pivot_table(), stack(), unstack() and melt() to reshape data. We can choose the method based on our analysis requirement.
Reshape Data Using pivot()
In Pandas, the pivot() function reshapes data based on column values.
It takes simple column-wise data as input, and groups the entries into a two-dimensional table.
Let's look at an example.
import pandas as pd
# create a DataFrame
data = {'Date': ['2023-08-01', '2023-08-01', '2023-08-02', '2023-08-02'],
'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)
print("Original Dataframe:\n", df)
# pivot the DataFrame
pivot_df = df.pivot(index='Date', columns='Category', values='Value')
print("Reshaped DataFrame:\n", pivot_df)
Output
Original Dataframe:
Date Category Value
0 2023-08-01 A 10
1 2023-08-01 B 20
2 2023-08-02 A 30
3 2023-08-02 B 40
Reshaped DataFrame:
Category A B
Date
2023-08-01 10 20
2023-08-02 30 40
In this example, we have passed the parameters index, columns and values to the pivot function. Here,
indexspecifies the column to be used as the index for the pivoted DataFramecolumnsspecifies the column whose unique values will become the new column headersvaluesspecifies the column containing the values to be placed in the new columns
So as we can see in the output, the DataFrame has been pivoted, with the unique values from the Category column (A and B) becoming separate columns.
And the corresponding values from the Value column are then placed in the respective cells.
To learn more, visit Pandas Pivot.
Reshape Data Using pivot_table()
The pivot_table() function in Pandas is a way for reshaping and summarizing data in a DataFrame.
It allows us to create a pivot table that aggregates and summarizes data based on the specified index, columns, and aggregation functions.
Let's look at an example.
import pandas as pd
# create a DataFrame
data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)
print("Original Dataframe:\n", df)
# create a pivot table
pivot_table_df = df.pivot_table(index='Category', values='Value', aggfunc='mean')
print("Reshaped Dataframe:\n", pivot_table_df)
Output
Original Dataframe: Category Value 0 A 10 1 B 20 2 A 30 3 B 40 4 A 50 5 B 60 Reshaped Dataframe: Value Category A 30.0 B 40.0
In the above example, we have used pivot_table() to create a pivot table from the original df DataFrame.
Inside the pivot_table(), we have passed parameters as
index = Category- specifying theCategorycolumn as the index of the pivot table. This means that each unique value inCategorywill become a row index in the pivot table
values = Value- specifying theValuecolumn as the source of values that will be used for aggregation in the pivot table
aggfunc = mean- we are using the mean aggregation function to calculate the average value of theValuecolumn for each category.
Hence in the output, the pivot table has the categories A and B as row indices and displays the mean values of their corresponding Value column entries.
This allows us to quickly compare the average values for different categories using a more compact and organized format.
To learn more, visit Pandas Pivot Table.
Reshape Data Using stack() and unstack()
In Pandas, we can also use the stack() and unstack() to reshape data.
stack()is used to pivot a level of the column labels, transforming them into innermost row index levels.unstack()is used to pivot a level of the row index, transforming it into an outermost column level
Let's look at an example.
import pandas as pd
# create a DataFrame
data = {'Date': ['2023-08-01', '2023-08-02'],
'Category_A': [10, 20],
'Category_B': [30, 40]}
df = pd.DataFrame(data)
# set 'Date' column as the index
df.set_index('Date', inplace=True)
# stack the columns into rows
stacked_df = df.stack()
print("Stack:\n", stacked_df)
print()
# unstack the rows back to columns
unstacked_df = stacked_df.unstack()
print("Unstack: \n", unstacked_df)
Output
Stack:
Date
2023-08-01 Category_A 10
Category_B 30
2023-08-02 Category_A 20
Category_B 40
dtype: int64
Unstack:
Category_A Category_B
Date
2023-08-01 10 30
2023-08-02 20 40
Here,
stack()is applied to thedfDataFrame, which pivots the column labels (Category_AandCategory_B) into a new level of row index.unstack()is applied to thestacked_df, which reverses the operation and pivots the innermost level of row index back to columns.
Use of melt() to Reshape DataFrame
The melt() function in Pandas transforms a DataFrame from a wide format to a long format.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob'],
'Math': [90, 85],
'History': [75, 92]}
df = pd.DataFrame(data)
# melt the DataFrame
melted_df = pd.melt(df, id_vars='Name', var_name='Subject', value_name='Score')
print(melted_df)
Output
Name Subject Score
0 Alice Math 90
1 Bob Math 85
2 Alice History 75
3 Bob History 92
In this example, we have used the melt() function to transform the DataFrame df from a wide format to a long format.
Inside melt(), we have passed different parameters,
id_varsspecifies the column that we want to keep unchangedvar_namespecifies the name for the new column that will hold the variable names (MathandHistory).value_namespecifies the name for the new column that will hold the values (the scores).