pyodide: loading…

[practice]NumPy Foundations

NumPy ↔ pandas crossover

# theory

pandas is built on numpy

Pandas exists to add labels (index, columns) and convenience (groupby, merge) on top of NumPy. The math is still NumPy. Dropping to it gives you speed and breaks you out of any quirks pandas adds.

import pandas as pd
import numpy as np

s = pd.Series([1.0, 2.0, 3.0])
s.to_numpy()        # array([1., 2., 3.])
s.values            # older API, same thing
np.asarray(s)       # works on Series or DataFrame

mixed workflow

# Pandas for loading and joining
df = pd.read_csv("sales.csv")
df = df.merge(products, on="sku")

# NumPy for the actual math
arr = df[["price", "quantity"]].to_numpy()
revenue = (arr[:, 0] * arr[:, 1]).sum()

# Back to pandas to label the result
df["revenue"] = arr[:, 0] * arr[:, 1]

The crossover isn't a one-way trip. You move down to NumPy for speed, then back up to pandas for everything else.

# examples [3]

# example 01 · converting in both directions

Series → array with to_numpy(); array → Series by constructor.

1
2
3
4
5
6
7
8
9
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · where dropping to NumPy actually helps

Heavy elementwise math on one column. Convert once, do the math, write back.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · mixing the two for a real calc

Pandas for the join and the labels, NumPy for the math.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Given df = pd.DataFrame({'x': [1, 2, 3, 4, 5]}), convert the 'x' column to a NumPy array, square every value, and print 'sum of squares: 55'.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Given df = pd.DataFrame({'price': [10, 20, 30], 'qty': [4, 5, 6]}), compute revenue = price * qty using a single NumPy operation on df[['price', 'qty']].to_numpy(), assign it back as a new column 'revenue', and print the DataFrame followed by 'total: 320' on a new line.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍
Loading PythonSetting up pandas & numpy...