[concept]Data Manipulation (WCTC)
Pivot Tables & Reshaping
# theory
pivot_table
If you've used pivot tables in Excel, this is the same idea but way more powerful.
pd.pivot_table(
df,
values="sales", # what to aggregate
index="region", # rows
columns="product", # columns
aggfunc="sum" # how to aggregate
)
Multiple aggregations:
pd.pivot_table(
df,
values="sales",
index="region",
aggfunc=["sum", "mean", "count"]
)
groupby + unstack
This is another way to get pivot-style output. Sometimes it's cleaner.
# Group then unstack
df.groupby(["region", "product"])["sales"].sum().unstack()
The result looks exactly like a pivot table. I actually prefer this approach sometimes because it's more explicit about what's happening.
melt (wide to long)
Sometimes data comes in wide format and you need it long. melt fixes that.
# Wide format:
# name Jan Feb Mar
# Alice 100 120 130
# Convert to long:
pd.melt(df, id_vars=["name"], var_name="month", value_name="sales")
# Result:
# name month sales
# Alice Jan 100
# Alice Feb 120
# Alice Mar 130
This is super useful when you need to plot data or do time series analysis.
stack / unstack
These are for reshaping multi-index data.
# unstack; move inner index to columns
grouped = df.groupby(["region", "year"])["sales"].sum()
grouped.unstack() # years become columns
# stack; move columns to index (opposite of unstack)
pivoted.stack()
fill_value for missing combos
When pivoting, some combinations might not exist in your data. You get NaN.
pd.pivot_table(
df,
values="sales",
index="region",
columns="product",
fill_value=0 # fill missing with 0 instead of NaN
)
margins (totals)
pd.pivot_table(
df,
values="sales",
index="region",
columns="product",
aggfunc="sum",
margins=True, # add row/column totals
margins_name="Total"
)
Pivot tables with margins are useful for reports. You get row and column totals in one call.
# examples [3]
Summarize sales by region and product
Convert wide data to long format; happens a lot with imported spreadsheets
Add row and column totals to your pivot table
# challenges [2]