python-mastery

# theory

why NumPy when pandas exists

Every pandas DataFrame and Series is a thin layer over a NumPy array. The math is in NumPy. So when pandas gets slow on a hot loop, the fix is almost always "drop to NumPy."

NumPy arrays have:

One dtype per array. A whole array is int64 or float64 or bool. No mixed columns. That's what makes them fast.
A fixed shape. You declare a shape; NumPy lays the bytes out contiguously. Reshape later if needed.
No labels. Just positions. Labels are pandas's job.

creating arrays

import numpy as np

a = np.array([1, 2, 3, 4])        # 1D, dtype inferred (int64)
b = np.array([[1, 2], [3, 4]])    # 2D, shape (2, 2)
c = np.zeros((3, 4))              # 3x4 of float zeros
d = np.arange(0, 10, 2)           # [0, 2, 4, 6, 8]
e = np.linspace(0, 1, 5)          # [0., 0.25, 0.5, 0.75, 1.]

dtype matters

Mixing types triggers automatic promotion:

np.array([1, 2, 3]).dtype          # int64
np.array([1, 2, 3.0]).dtype        # float64 (one float promoted everything)
np.array([1, "a"]).dtype           # <U21 (string), all numbers got stringified

This is the failure mode you'll hit again and again. A bad value in a CSV makes the whole column a string and your math silently breaks.

shape & indexing

m = np.array([[1, 2, 3], [4, 5, 6]])
m.shape         # (2, 3)
m.size          # 6 total elements
m[0]            # first row: [1, 2, 3]
m[0, 1]         # element at row 0, col 1: 2
m[:, 1]         # all rows, second column: [2, 5]
m[1, :]         # second row, all columns: [4, 5, 6]

Slicing returns a view into the same memory, not a copy. Modifying the slice modifies the original. Useful, easy to forget.

# examples [3]

# example 01 · array creation utilities

zeros, ones, arange, linspace cover most starter shapes.

1

2

3

4

5

6

🐍

Loading PythonSetting up pandas & numpy...

# example 02 · dtype promotion landmines

One stray string promotes the whole array. This is the source of most 'why is my math returning weird stuff' bugs.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

🐍

Loading PythonSetting up pandas & numpy...

# example 03 · slices are views, not copies

Editing a slice mutates the original array. Use .copy() when you want a real new array.

1

2

3

4

5

6

7

8

9

10

11

12

🐍

Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo

Build a 1D NumPy array of the integers from 5 through 14 inclusive. Print its dtype and its sum on separate lines in the format 'dtype: ...' and 'sum: ...'.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

import numpy as np

# Build a small 2D array and explore its shape and dtype
sales = np.array([
    [120, 95, 80],
    [200, 180, 160],
    [310, 290, 270],
])

print("shape:", sales.shape)
print("dtype:", sales.dtype)
print("total elements:", sales.size)

# First row (single product across three regions)
print("product 0 across regions:", sales[0])
# Middle region column
print("region 1 across products:", sales[:, 1])


# Build a 1D NumPy array of the integers from 5 through 14 inclusive. Print its dtype and its sum on separate lines in the format 'dtype: ...' and 'sum: ...'.
# Your code here:

🐍

Loading PythonSetting up pandas & numpy...

# challenge 02/02todo

Build a 3x4 array of float zeros, set every value in the middle row (row index 1) to 7, and print the array. Then on the next line print 'row_sum: 28' where 28 is the sum of that row.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

import numpy as np

# Build a small 2D array and explore its shape and dtype
sales = np.array([
    [120, 95, 80],
    [200, 180, 160],
    [310, 290, 270],
])

print("shape:", sales.shape)
print("dtype:", sales.dtype)
print("total elements:", sales.size)

# First row (single product across three regions)
print("product 0 across regions:", sales[0])
# Middle region column
print("region 1 across products:", sales[:, 1])


# Build a 3x4 array of float zeros, set every value in the middle row (row index 1) to 7, and print the array. Then on the next line print 'row_sum: 28' where 28 is the sum of that row.
# Your code here:

🐍

Loading PythonSetting up pandas & numpy...

Arrays and dtypes

why NumPy when pandas exists

creating arrays

dtype matters

shape & indexing