pyodide: loading…

[concept]Data Cleaning

Missing Data

# theory

detecting missing values

In pandas, missing values are represented as NaN (Not a Number) or None.

df.isna()        # DataFrame of True/False
df.isna().sum()  # Count of NaN per column
df.isna().any()  # True if column has any NaN

handling missing data

Option 1: Drop rows with missing values

df.dropna()                    # Drop any row with NaN
df.dropna(subset=["name"])     # Drop only if 'name' is NaN
df.dropna(how="all")           # Drop only if ALL values are NaN
df.dropna(thresh=3)            # Keep rows with at least 3 non-NaN

Option 2: Fill missing values

df.fillna(0)                   # Fill all NaN with 0
df.fillna({"age": 0, "city": "Unknown"})  # Different values per column
df["age"].fillna(df["age"].mean())  # Fill with mean
df.fillna(method="ffill")      # Forward fill (use previous value)
df.fillna(method="bfill")      # Backward fill (use next value)

checking

# Total missing values
print(df.isna().sum().sum())

# Percentage missing per column
print(df.isna().mean() * 100)

# Rows with any missing values
rows_with_na = df[df.isna().any(axis=1)]

# examples [3]

# example 01 · detecting missing values

Find where NaN values exist

1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · dropping missing values

Remove rows with NaN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · filling missing values

Replace NaN with meaningful values

1
2
3
4
5
6
7
8
9
10
11
12
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Count the total number of missing values in the students DataFrame and print it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Create a DataFrame with some NaN values, then fill all NaN with the string 'MISSING' and print it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

You're a tech recruiter analyzing developer survey responses. Before building your insights report, you need to verify data quality by checking for any missing values in the dataset.

# task

Check for Missing Survey Data

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
🐍
Loading PythonSetting up pandas & numpy...