[concept]Grouping & Combining
Merge & Join
# theory
pd.merge()
Think of merge like a SQL JOIN. You combine two tables based on matching values.
pd.merge(left_df, right_df, on="common_column")
join types
| Type | Keeps |
|---|---|
| inner | Only matching rows (default) |
| left | All from left + matches from right |
| right | All from right + matches from left |
| outer | All rows from both |
pd.merge(orders, customers, on="customer_id", how="left")
different column names
When the join columns have different names:
pd.merge(orders, customers,
left_on="cust_id",
right_on="customer_id")
duplicate column names
When both DataFrames have columns with the same name (that aren't join keys):
pd.merge(df1, df2, on="id", suffixes=("_left", "_right"))
# Columns become: value_left, value_right
patterns
# Add customer details to orders
orders_with_details = pd.merge(orders, customers, on="customer_id")
# Find orders without customers (left join, then filter NaN)
all_orders = pd.merge(orders, customers, on="customer_id", how="left")
orphan_orders = all_orders[all_orders["customer_name"].isna()]
# Find all combinations (cross join)
pd.merge(df1, df2, how="cross")
See also
The SQL equivalent is INNER JOIN ... ON (and LEFT JOIN for how="left"). The matching mental model with table-flavored examples is on damato-sql at /learn/joining-tables/inner-join.
# examples [3]
Only keep rows that match in both tables
Keep all rows from left table
Handle duplicate column names
# challenges [2]
# project
# project-challenge
thread: Survey Insights Report · reward: 50 xp
# brief
Your report needs regional context. Create a country_info table with regions (North America, Europe, Asia, Oceania) and merge it with the survey data to enable regional salary analysis.
# task