pyodide: loading…

[concept]Functions & Apply

Vectorized Operations

# theory

why vectorization matters

Slow (loop):

result = []
for i in range(len(df)):
    result.append(df.iloc[i]["a"] + df.iloc[i]["b"])
df["sum"] = result

Fast (vectorized):

df["sum"] = df["a"] + df["b"]

Vectorized operations are 10-100x faster because they use optimized C code under the hood.

vectorized math

df["doubled"] = df["value"] * 2
df["squared"] = df["value"] ** 2
df["total"] = df["price"] * df["qty"]
df["pct"] = df["value"] / df["value"].sum() * 100

string ops (.str)

df["lower"] = df["name"].str.lower()
df["first_letter"] = df["name"].str[0]
df["has_a"] = df["name"].str.contains("a")
df["parts"] = df["text"].str.split(",")

conditionals (np.where, np.select)

import numpy as np

# np.where: if-else vectorized
df["label"] = np.where(df["value"] > 50, "High", "Low")

# Multiple conditions
conditions = [
    df["score"] >= 90,
    df["score"] >= 80,
    df["score"] >= 70
]
choices = ["A", "B", "C"]
df["grade"] = np.select(conditions, choices, default="F")

# pd.cut: binning
df["bucket"] = pd.cut(df["value"], bins=[0, 25, 50, 75, 100])

vs apply()

OperationUse
Math on columnsVectorized
String methods.str accessor
Simple conditionsnp.where / np.select
Complex row logicapply()
Need multiple columnsapply(axis=1) or vectorized

# examples [3]

# example 01 · vectorized math

Column-wise calculations without loops

1
2
3
4
5
6
7
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · np.where for conditionals

Vectorized if-else

1
2
3
4
5
6
7
8
9
10
11
12
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · pd.cut for binning

Group continuous values into bins

1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Use np.where to create 'status' column: 'High' if score >= 85, else 'Normal'. Print results.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Use pd.cut to bin ages into 'Teen' (0-19), 'Young' (20-21), 'Adult' (22+). Print the distribution.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Sales Performance Dashboard · reward: 50 xp

# brief

Management wants to tier sales reps based on revenue performance. Use np.where to assign Star/Solid/Developing tiers based on total revenue thresholds.

# task

Assign Performance Tiers

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
🐍
Loading PythonSetting up pandas & numpy...