python-mastery

# theory

pd.merge()

Think of merge like a SQL JOIN. You combine two tables based on matching values.

pd.merge(left_df, right_df, on="common_column")

join types

Type	Keeps
inner	Only matching rows (default)
left	All from left + matches from right
right	All from right + matches from left
outer	All rows from both

pd.merge(orders, customers, on="customer_id", how="left")

different column names

When the join columns have different names:

pd.merge(orders, customers,
         left_on="cust_id",
         right_on="customer_id")

duplicate column names

When both DataFrames have columns with the same name (that aren't join keys):

pd.merge(df1, df2, on="id", suffixes=("_left", "_right"))
# Columns become: value_left, value_right

patterns

# Add customer details to orders
orders_with_details = pd.merge(orders, customers, on="customer_id")

# Find orders without customers (left join, then filter NaN)
all_orders = pd.merge(orders, customers, on="customer_id", how="left")
orphan_orders = all_orders[all_orders["customer_name"].isna()]

# Find all combinations (cross join)
pd.merge(df1, df2, how="cross")

import io

# Create two related DataFrames
orders_csv = """order_id,product,customer_id,amount
1,Laptop,101,999
2,Mouse,102,25
3,Keyboard,101,75
4,Monitor,103,350"""

customers_csv = """customer_id,name,city
101,Alice,NYC
102,Bob,LA
104,Carol,Chicago"""

orders = pd.read_csv(io.StringIO(orders_csv))
customers = pd.read_csv(io.StringIO(customers_csv))

print("Orders:")
print(orders)
print("\nCustomers:")
print(customers)


# Merge students with a grades_info DataFrame containing grade descriptions. Use the 'grade' column as the key.
# Your code here:

🐍

Loading PythonSetting up pandas & numpy...

# challenge 02/02todo

Perform a left join between sales and a category_info table. Show all sales even if category info is missing.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

import io

# Create two related DataFrames
orders_csv = """order_id,product,customer_id,amount
1,Laptop,101,999
2,Mouse,102,25
3,Keyboard,101,75
4,Monitor,103,350"""

customers_csv = """customer_id,name,city
101,Alice,NYC
102,Bob,LA
104,Carol,Chicago"""

orders = pd.read_csv(io.StringIO(orders_csv))
customers = pd.read_csv(io.StringIO(customers_csv))

print("Orders:")
print(orders)
print("\nCustomers:")
print(customers)


# Perform a left join between sales and a category_info table. Show all sales even if category info is missing.
# Your code here:

🐍

Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

Your report needs regional context. Create a country_info table with regions (North America, Europe, Asia, Oceania) and merge it with the survey data to enable regional salary analysis.

# task

Enrich Survey with Region Data

# your code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

import pandas as pd
import io

survey_csv = """RespondentID,Country,Age,YearsExperience,LanguageUsed,Salary,RemoteWork,Education,JobTitle
1001,USA,28,5,Python,95000,Yes,Bachelor's,Data Scientist
1002,India,24,2,Python,28000,Yes,Master's,Data Analyst
1003,USA,35,12,Python,145000,No,Master's,Senior Data Engineer
1004,Canada,29,6,R,82000,Yes,PhD,Research Scientist
1005,UK,31,8,Python,78000,Hybrid,Master's,Machine Learning Engineer
1006,Germany,27,4,SQL,65000,No,Bachelor's,Data Analyst
1007,USA,42,18,Python,175000,Yes,PhD,Principal Data Scientist
1008,India,26,3,Python,32000,Yes,Bachelor's,Data Analyst
1009,USA,33,9,Java,125000,No,Master's,Data Engineer
1010,Canada,38,14,Python,115000,Hybrid,Bachelor's,Senior Data Scientist
1011,UK,25,2,R,45000,Yes,Master's,Junior Data Scientist
1012,India,30,7,Python,48000,Yes,Master's,Data Scientist
1013,USA,29,5,Python,98000,Hybrid,Bachelor's,Data Scientist
1014,Australia,34,10,Python,105000,Yes,Master's,Machine Learning Engineer
1015,Germany,28,4,SQL,58000,No,Bachelor's,Business Analyst"""

survey = pd.read_csv(io.StringIO(survey_csv))

# Create a country_info DataFrame with Country and Region columns
# Then merge it with the survey data

import pandas as pd
import io

survey_csv = """RespondentID,Country,Age,YearsExperience,LanguageUsed,Salary,RemoteWork,Education,JobTitle
1001,USA,28,5,Python,95000,Yes,Bachelor's,Data Scientist
1002,India,24,2,Python,28000,Yes,Master's,Data Analyst
1003,USA,35,12,Python,145000,No,Master's,Senior Data Engineer
1004,Canada,29,6,R,82000,Yes,PhD,Research Scientist
1005,UK,31,8,Python,78000,Hybrid,Master's,Machine Learning Engineer
1006,Germany,27,4,SQL,65000,No,Bachelor's,Data Analyst
1007,USA,42,18,Python,175000,Yes,PhD,Principal Data Scientist
1008,India,26,3,Python,32000,Yes,Bachelor's,Data Analyst
1009,USA,33,9,Java,125000,No,Master's,Data Engineer
1010,Canada,38,14,Python,115000,Hybrid,Bachelor's,Senior Data Scientist
1011,UK,25,2,R,45000,Yes,Master's,Junior Data Scientist
1012,India,30,7,Python,48000,Yes,Master's,Data Scientist
1013,USA,29,5,Python,98000,Hybrid,Bachelor's,Data Scientist
1014,Australia,34,10,Python,105000,Yes,Master's,Machine Learning Engineer
1015,Germany,28,4,SQL,58000,No,Bachelor's,Business Analyst"""

survey = pd.read_csv(io.StringIO(survey_csv))

# Create a country_info DataFrame with Country and Region columns
# Then merge it with the survey data

🐍

Loading PythonSetting up pandas & numpy...

Merge & Join

pd.merge()

join types

different column names

duplicate column names

patterns

See also

Enrich Survey with Region Data