pyodide: loading…

[concept]Grouping & Combining

GroupBy Basics

# theory

split-apply-combine

GroupBy works in three steps:

  1. Split the data into groups based on a key
  2. Apply a function to each group
  3. Combine the results
df.groupby("category")["price"].mean()
#   Split by category
#   Apply mean() to price column
#   Combine into result Series

groupby basics

# Single aggregation
df.groupby("category")["amount"].sum()
df.groupby("category")["amount"].mean()
df.groupby("category")["amount"].count()
df.groupby("category")["amount"].min()
df.groupby("category")["amount"].max()

# Multiple columns
df.groupby("category")[["amount", "quantity"]].sum()

agg() for multi-aggregations

df.groupby("category")["amount"].agg(["sum", "mean", "count"])

This gives you sum, mean, and count for each category in one table.

reset_index after groupby

By default, the group key becomes the index. To get it back as a column:

result = df.groupby("category")["amount"].sum().reset_index()
# Now 'category' is a regular column again

See also

This is the pandas equivalent of SQL's GROUP BY. Same idea, different syntax. Cross-reference on damato-sql at /learn/data-analysis/group-by.

# examples [3]

# example 01 · basic GroupBy

Group and aggregate data

1
2
3
4
5
6
7
8
9
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · multiple aggregations

Get several stats at once

1
2
3
4
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · reset index

Turn grouped result back into a regular DataFrame

1
2
3
4
5
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Calculate the total revenue (price * quantity) for sales, then group by category and sum. Print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Group students by grade and find the max score in each grade. Print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

Your recruiter report needs a salary benchmark section. Calculate the average salary for each country to help set competitive compensation packages for international hiring.

# task

Average Salary by Country

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
🐍
Loading PythonSetting up pandas & numpy...