pyodide: loading…

[concept]Pandas Fundamentals

Reading & Writing Files

# theory

reading csv

The most common operation; loading data from a CSV:

df = pd.read_csv("data.csv")

Useful parameters:

pd.read_csv("data.csv",
    sep=",",              # Delimiter (default comma)
    header=0,             # Row number for headers (0 = first row)
    names=["a", "b", "c"], # Custom column names
    index_col="id",       # Use a column as index
    usecols=["a", "b"],   # Only read specific columns
    nrows=1000,           # Only read first N rows
    skiprows=5,           # Skip first N rows
    na_values=["", "NA", "NULL"],  # Treat these as missing
    dtype={"zip": str}    # Force column types
)

reading from a string

import io
csv_string = """name,age
Alice,25
Bob,30"""
df = pd.read_csv(io.StringIO(csv_string))

writing csv

df.to_csv("output.csv")
df.to_csv("output.csv", index=False)  # Without row index
df.to_csv("output.csv", columns=["name", "age"])  # Specific columns

other formats

# Excel
df = pd.read_excel("data.xlsx")
df.to_excel("output.xlsx", index=False)

# JSON
df = pd.read_json("data.json")
df.to_json("output.json")

# Parquet (fast, compressed)
df = pd.read_parquet("data.parquet")
df.to_parquet("output.parquet")

# examples [3]

# example 01 · reading with options

Common read_csv parameters

1
2
3
4
5
6
7
8
9
10
11
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · reading specific columns

Only load the columns you need

1
2
3
4
5
6
7
8
9
10
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · handling missing values

Specify what values should be treated as NA

1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Read this CSV string into a DataFrame and print the first 2 rows: 'item,qty\nApple,10\nBanana,25\nOrange,15'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Read the sales DataFrame info and print how many non-null values are in each column.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: SF Permits Analysis · reward: 50 xp

# brief

Before building reports, you need to check the data quality. Some permits are missing their Issued Date. Find and report any missing values in the dataset.

# task

Check Data Quality

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍
Loading PythonSetting up pandas & numpy...