pyodide: loading…

[concept]NumPy Foundations

Arrays and dtypes

# theory

why NumPy when pandas exists

Every pandas DataFrame and Series is a thin layer over a NumPy array. The math is in NumPy. So when pandas gets slow on a hot loop, the fix is almost always "drop to NumPy."

NumPy arrays have:

  • One dtype per array. A whole array is int64 or float64 or bool. No mixed columns. That's what makes them fast.
  • A fixed shape. You declare a shape; NumPy lays the bytes out contiguously. Reshape later if needed.
  • No labels. Just positions. Labels are pandas's job.

creating arrays

import numpy as np

a = np.array([1, 2, 3, 4])        # 1D, dtype inferred (int64)
b = np.array([[1, 2], [3, 4]])    # 2D, shape (2, 2)
c = np.zeros((3, 4))              # 3x4 of float zeros
d = np.arange(0, 10, 2)           # [0, 2, 4, 6, 8]
e = np.linspace(0, 1, 5)          # [0., 0.25, 0.5, 0.75, 1.]

dtype matters

Mixing types triggers automatic promotion:

np.array([1, 2, 3]).dtype          # int64
np.array([1, 2, 3.0]).dtype        # float64 (one float promoted everything)
np.array([1, "a"]).dtype           # <U21 (string), all numbers got stringified

This is the failure mode you'll hit again and again. A bad value in a CSV makes the whole column a string and your math silently breaks.

shape & indexing

m = np.array([[1, 2, 3], [4, 5, 6]])
m.shape         # (2, 3)
m.size          # 6 total elements
m[0]            # first row: [1, 2, 3]
m[0, 1]         # element at row 0, col 1: 2
m[:, 1]         # all rows, second column: [2, 5]
m[1, :]         # second row, all columns: [4, 5, 6]

Slicing returns a view into the same memory, not a copy. Modifying the slice modifies the original. Useful, easy to forget.

# examples [3]

# example 01 · array creation utilities

zeros, ones, arange, linspace cover most starter shapes.

1
2
3
4
5
6
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · dtype promotion landmines

One stray string promotes the whole array. This is the source of most 'why is my math returning weird stuff' bugs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · slices are views, not copies

Editing a slice mutates the original array. Use .copy() when you want a real new array.

1
2
3
4
5
6
7
8
9
10
11
12
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Build a 1D NumPy array of the integers from 5 through 14 inclusive. Print its dtype and its sum on separate lines in the format 'dtype: ...' and 'sum: ...'.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Build a 3x4 array of float zeros, set every value in the middle row (row index 1) to 7, and print the array. Then on the next line print 'row_sum: 28' where 28 is the sum of that row.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
🐍
Loading PythonSetting up pandas & numpy...