python-mastery

# theory

grouping by multiple columns

df.groupby(["region", "category"])["sales"].sum()

This creates a hierarchical index with all combinations.

multi-aggregations on multiple columns

df.groupby("category").agg({
    "price": "mean",
    "quantity": "sum",
    "date": "count"
})

named aggregations

df.groupby("category").agg(
    avg_price=("price", "mean"),
    total_qty=("quantity", "sum"),
    num_orders=("date", "count")
)

This gives you descriptive column names in the result.

multi-index results

# After multi-column groupby
result = df.groupby(["region", "category"])["sales"].sum()

# Reset to flat DataFrame
flat = result.reset_index()

# Or unstack for pivot-style view
pivoted = result.unstack()

size vs count

df.groupby("category").size()   # Count all rows
df.groupby("category").count()  # Count non-NaN per column

# examples [3]

# example 01 · multi-column GroupBy

Group by two columns at once

1

2

3

4

5

6

🐍

Loading PythonSetting up pandas & numpy...

# example 02 · named aggregations

Give your aggregated columns meaningful names

1

2

3

4

5

6

🐍

Loading PythonSetting up pandas & numpy...

# example 03 · dictionary aggregation

Different aggregations for different columns

1

2

3

4

5

6

🐍

Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo

Group students by both 'grade' and 'subject', count students in each group, and print the result.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

🐍

Loading PythonSetting up pandas & numpy...

# challenge 02/02todo

Calculate both the mean and max score by subject using named aggregations.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

🐍

Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

You need to understand programming language trends across regions. Group the survey data by Country and LanguageUsed to count how many respondents use each language in each country.

# task

Language Popularity by Country

# your code

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

import pandas as pd
import io

survey_csv = """RespondentID,Country,Age,YearsExperience,LanguageUsed,Salary,RemoteWork,Education,JobTitle
1001,USA,28,5,Python,95000,Yes,Bachelor's,Data Scientist
1002,India,24,2,Python,28000,Yes,Master's,Data Analyst
1003,USA,35,12,Python,145000,No,Master's,Senior Data Engineer
1004,Canada,29,6,R,82000,Yes,PhD,Research Scientist
1005,UK,31,8,Python,78000,Hybrid,Master's,Machine Learning Engineer
1006,Germany,27,4,SQL,65000,No,Bachelor's,Data Analyst
1007,USA,42,18,Python,175000,Yes,PhD,Principal Data Scientist
1008,India,26,3,Python,32000,Yes,Bachelor's,Data Analyst
1009,USA,33,9,Java,125000,No,Master's,Data Engineer
1010,Canada,38,14,Python,115000,Hybrid,Bachelor's,Senior Data Scientist
1011,UK,25,2,R,45000,Yes,Master's,Junior Data Scientist
1012,India,30,7,Python,48000,Yes,Master's,Data Scientist
1013,USA,29,5,Python,98000,Hybrid,Bachelor's,Data Scientist
1014,Australia,34,10,Python,105000,Yes,Master's,Machine Learning Engineer
1015,Germany,28,4,SQL,58000,No,Bachelor's,Business Analyst"""

survey = pd.read_csv(io.StringIO(survey_csv))

# Group by Country and LanguageUsed, count respondents in each group

import pandas as pd
import io

survey_csv = """RespondentID,Country,Age,YearsExperience,LanguageUsed,Salary,RemoteWork,Education,JobTitle
1001,USA,28,5,Python,95000,Yes,Bachelor's,Data Scientist
1002,India,24,2,Python,28000,Yes,Master's,Data Analyst
1003,USA,35,12,Python,145000,No,Master's,Senior Data Engineer
1004,Canada,29,6,R,82000,Yes,PhD,Research Scientist
1005,UK,31,8,Python,78000,Hybrid,Master's,Machine Learning Engineer
1006,Germany,27,4,SQL,65000,No,Bachelor's,Data Analyst
1007,USA,42,18,Python,175000,Yes,PhD,Principal Data Scientist
1008,India,26,3,Python,32000,Yes,Bachelor's,Data Analyst
1009,USA,33,9,Java,125000,No,Master's,Data Engineer
1010,Canada,38,14,Python,115000,Hybrid,Bachelor's,Senior Data Scientist
1011,UK,25,2,R,45000,Yes,Master's,Junior Data Scientist
1012,India,30,7,Python,48000,Yes,Master's,Data Scientist
1013,USA,29,5,Python,98000,Hybrid,Bachelor's,Data Scientist
1014,Australia,34,10,Python,105000,Yes,Master's,Machine Learning Engineer
1015,Germany,28,4,SQL,58000,No,Bachelor's,Business Analyst"""

survey = pd.read_csv(io.StringIO(survey_csv))

# Group by Country and LanguageUsed, count respondents in each group

🐍

Loading PythonSetting up pandas & numpy...

Multi-Column GroupBy

grouping by multiple columns

multi-aggregations on multiple columns

named aggregations

multi-index results

size vs count

Language Popularity by Country