pyodide: loading…

[practice]Functions & Apply

Custom Aggregations

# theory

custom funcs in agg()

You can use custom functions with groupby().agg():

def range_func(x):
    return x.max() - x.min()

df.groupby("category")["value"].agg(range_func)

multiple custom aggregations

df.groupby("category")["value"].agg([
    "sum",
    "mean",
    ("range", lambda x: x.max() - x.min()),
    ("cv", lambda x: x.std() / x.mean())  # coefficient of variation
])

named aggregations w/ custom funcs

df.groupby("category").agg(
    total=("value", "sum"),
    average=("value", "mean"),
    spread=("value", lambda x: x.max() - x.min())
)

transform() vs agg()

  • agg(): Returns one row per group
  • transform(): Returns same shape as input
# agg - one value per group
df.groupby("category")["value"].agg("mean")

# transform - value for each row (group's mean)
df["group_mean"] = df.groupby("category")["value"].transform("mean")

See also

The SQL equivalents of transform("rank") and "percentage of group total" are window functions: RANK() OVER (PARTITION BY group ORDER BY value) and value * 100.0 / SUM(value) OVER (PARTITION BY group). Cross-reference on damato-sql at /learn/window-functions/ranking-functions.

# examples [3]

# example 01 · custom agg functions

Define your own aggregation logic

1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · using transform()

Calculate group stats for each row

1
2
3
4
5
6
7
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · percentage of group total

Calculate relative contribution within groups

1
2
3
4
5
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Write a custom function that returns max - min, then use it to find the score range by grade.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Use transform() to add a column showing each student's rank within their subject (by score, highest=1).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Sales Performance Dashboard · reward: 50 xp

# brief

Finance needs to understand revenue variability by category. Create a custom aggregation function that calculates the coefficient of variation (std/mean) for each category's revenue.

# task

Calculate Sales Variance

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
🐍
Loading PythonSetting up pandas & numpy...