[concept]Grouping & Combining
Fixed-Width Files
# theory
fixed-width files
Unlike CSVs (comma-separated), fixed-width files use character positions. Each field occupies a set number of characters:
NAMEAGECITY
Alice 25NYC
Bob 30LA
Carol 28CHI
Name is characters 0-5, Age is 6-7, City is 8-11.
pd.read_fwf()
# Auto-detect column widths (sometimes works)
df = pd.read_fwf("data.txt")
# Specify column positions manually
df = pd.read_fwf("data.txt",
colspecs=[(0, 6), (6, 8), (8, 12)],
names=["name", "age", "city"])
# Using widths (simpler if columns are evenly spaced)
df = pd.read_fwf("data.txt",
widths=[6, 2, 4],
names=["name", "age", "city"])
colspecs
List of (start, end) tuples. Character positions are 0-indexed:
colspecs = [
(0, 10), # Characters 0-9 (10 chars)
(10, 15), # Characters 10-14 (5 chars)
(15, 25) # Characters 15-24 (10 chars)
]
common parameters
pd.read_fwf(filepath,
colspecs=[(0,5), (5,10)], # Column positions
widths=[5, 5], # Alternative: column widths
names=["col1", "col2"], # Column names
skiprows=1, # Skip header row
na_values=["", " "], # Treat as missing
dtype={"zip": str} # Force data types
)# examples [3]
# example 01 · reading fixed-width data
Parse data with specific column positions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
🐍
# example 02 · using widths instead of colspecs
Simpler syntax when you know column widths
1
2
3
4
5
6
7
8
9
10
11
12
13
🐍
# example 03 · auto-detect columns
Let pandas figure out the columns
1
2
3
4
5
6
7
8
9
10
🐍
# challenges [2]
# challenge 01/02todo
Read this fixed-width string where Product is chars 0-10, Price is 10-16, Qty is 16-20. Print the result.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
# challenge 02/02todo
Read fixed-width data using widths=[5, 3, 8] for ID, Age, and City columns.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
🐍
# project
# project-challenge
thread: Survey Insights Report · reward: 50 xp
# brief
Your company's legacy HR system exports employee data in fixed-width format. Parse this sample record to integrate historical employee data with your modern survey analysis.
# task
Parse Legacy HR System Export
# your code
1
2
3
4
5
6
7
8
9
10
11
🐍