pyodide: loading…

[practice]NumPy Foundations

Vectorization vs loops

# theory

the slow way

You can iterate over a NumPy array with a Python for loop. It works. It's also one to two orders of magnitude slower than the vectorized version.

# Slow: Python loop runs once per element
result = []
for x in a:
    result.append(x ** 2 + 1)
result = np.array(result)
# Fast: one C-level call to do the same thing
result = a ** 2 + 1

The vectorized form is also shorter, easier to read, and impossible to get wrong on the indexing.

vectorized conditionals

np.where is the vectorized if/else. Branchless, no Python loop.

# Slow:
out = []
for x in a:
    out.append(0 if x < 0 else x)
out = np.array(out)

# Fast:
out = np.where(a < 0, 0, a)

For more than two branches, np.select:

np.select(
    [a < 0, a < 100, a >= 100],
    ["below_zero", "small", "big"],
    default="unknown",
)

See also

The SQL equivalent of np.where is a CASE WHEN ... THEN ... ELSE ... END expression. Same branchless mental model, different syntax. The full SQL CASE lesson lives on damato-sql at /learn/data-analysis/case-expressions.

# examples [3]

# example 01 · np.where for binary branching

Replace if/else inside a loop with one call.

1
2
3
4
5
6
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · np.select for multi-branch tiering

Three or more buckets. Conditions list, choices list, default fallback.

1
2
3
4
5
6
7
8
9
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · vectorization gets harder with stateful loops

Some sequences (running totals, cumulative ops) need a vectorized primitive, not just elementwise math.

1
2
3
4
5
6
7
8
9
10
11
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Given temps = np.array([-5, 12, -3, 20, 0, 8, -10, 15]), replace every negative value with 0 using np.where (no loops). Print 'positive_only: [...]' showing the resulting array.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Categorize purchase amounts = np.array([3, 18, 55, 130, 240, 800]) into 'small' (<20), 'medium' (<100), 'large' (<500), or 'enterprise' (>=500). Print 'tiers:' followed by the resulting array on the next line.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
🐍
Loading PythonSetting up pandas & numpy...