pyodide: loading…

[practice]Data Cleaning

String Cleaning

# theory

the .str accessor

Pandas provides string methods through the .str accessor:

df["name"].str.lower()
df["name"].str.upper()
df["name"].str.title()
df["name"].str.strip()

common operations

# Remove whitespace
df["text"].str.strip()       # Both ends
df["text"].str.lstrip()      # Left only
df["text"].str.rstrip()      # Right only

# Case conversion
df["text"].str.lower()
df["text"].str.upper()
df["text"].str.title()       # Capitalize Words

# Replace patterns
df["text"].str.replace("old", "new")
df["text"].str.replace(r"\d+", "", regex=True)  # Remove numbers

# Check content
df["text"].str.contains("word")       # Returns True/False
df["text"].str.startswith("prefix")
df["text"].str.endswith("suffix")

extracting

# Split and get parts
df["name"].str.split(" ").str[0]  # First word
df["name"].str.split(" ").str[-1]  # Last word

# Extract with regex
df["text"].str.extract(r"(\d+)")  # First number

# Get length
df["text"].str.len()

chaining

Chain multiple operations:

df["clean_name"] = (df["name"]
    .str.strip()
    .str.lower()
    .str.replace(" ", "_"))

# examples [3]

# example 01 · basic string cleaning

Strip whitespace and standardize case

1
2
3
4
5
6
7
8
9
10
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · replace and contains

Find and replace text patterns

1
2
3
4
5
6
7
8
9
10
11
12
🐍
Loading PythonSetting up pandas & numpy...
# example 03 · splitting strings

Break apart strings into columns

1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
Loading PythonSetting up pandas & numpy...

# challenges [2]

# challenge 01/02todo
Clean the students 'name' column: strip whitespace and convert to title case. Print the names.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/02todo
Find all students whose name contains the letter 'a' (case insensitive) and print them.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Survey Insights Report · reward: 50 xp

# brief

Job titles in the survey have inconsistent formatting. Clean the JobTitle column by stripping whitespace and converting to title case for consistent reporting in your recruiter dashboard.

# task

Standardize Job Titles

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
🐍
Loading PythonSetting up pandas & numpy...