pyodide: loading…

[challenge]Functions & Apply

Capstone Project

# theory

the capstone

Every lesson up to this one handed you a partial query and asked you to fill in a blank. This one doesn't. You get a real dataset and three questions. You choose the approach.

the dataset

The Plotly diabetes dataset, 768 rows, 9 columns of medical metrics. The starter code pyfetches it for you and gives you a DataFrame named df. After that, you're on your own.

columns: Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age, Outcome
Outcome: 1 = diabetic, 0 = not

what open-ended means

  • No starter code beyond loading the data.
  • No structure-of-the-solution comments.
  • The validator only checks the printed answer, not how you arrived at it.
  • A peek-able example solution is hidden below in a collapsed details block. Try it cold first.

strategy from earlier lessons

You can keep using it. groupby, apply, vectorized math, NumPy when speed matters. The point of an open-ended challenge isn't that you need new techniques; it's that nobody is telling you which one to reach for.

a peek at the solution

It is one short paragraph of code (under 20 lines, total). If your draft is creeping past 40 lines, you're probably overbuilding.

<details> <summary><strong>peek the reference solution</strong> (try it without first)</summary>
import io
import pandas as pd
from pyodide.http import pyfetch

URL = "https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv"
resp = await pyfetch(URL)
df = pd.read_csv(io.StringIO(await resp.string()))

# Q1: positivity rate
print(f"diabetic rate: {df['Outcome'].mean() * 100:.1f}%")

# Q2: avg Glucose for diabetics vs non-diabetics
mean_by_outcome = df.groupby("Outcome")["Glucose"].mean().round(1)
print(f"diabetic mean Glucose: {mean_by_outcome[1]}")
print(f"non-diabetic mean Glucose: {mean_by_outcome[0]}")

# Q3: highest-BMI age bucket
df["age_bucket"] = pd.cut(df["Age"], bins=[20, 30, 40, 50, 60, 100],
                         labels=["20s", "30s", "40s", "50s", "60+"])
top_bucket = df.groupby("age_bucket", observed=True)["BMI"].mean().idxmax()
print(f"highest-BMI age bucket: {top_bucket}")
</details>

# examples [2]

# example 01 · complete pipeline example

Full end-to-end data analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
🐍
Loading PythonSetting up pandas & numpy...
# example 02 · sales analysis pipeline

Business intelligence from sales data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
🐍
Loading PythonSetting up pandas & numpy...

# challenges [3]

# challenge 01/03todo
Using the loaded df, print the share of rows where Outcome == 1 (diabetic) as a percentage with one decimal, in the format 'diabetic rate: X.X%'. The actual answer is around 34.9%. No hint, no starter steps. Pick your own approach.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
🐍
Loading PythonSetting up pandas & numpy...
# challenge 02/03todo
From the same df, compute the mean Glucose for diabetics (Outcome == 1) and for non-diabetics (Outcome == 0). Print 'diabetic mean Glucose: X.X' and 'non-diabetic mean Glucose: Y.Y' (one decimal each). Diabetics should be noticeably higher; that's the whole point of the dataset.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
🐍
Loading PythonSetting up pandas & numpy...
# challenge 03/03todo
Bucket the Age column into 20s, 30s, 40s, 50s, 60+ (use pd.cut). Find which bucket has the highest average BMI. Print 'highest-BMI age bucket: NAME' (one of 20s/30s/40s/50s/60+).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
🐍
Loading PythonSetting up pandas & numpy...

# project

# project-challenge

thread: Sales Performance Dashboard · reward: 50 xp

# brief

Build the complete sales performance dashboard combining all techniques: revenue calculation, rep rankings, regional analysis, and category breakdowns. This is your capstone for the sales thread!

# task

Complete Sales Dashboard

# your code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
🐍
Loading PythonSetting up pandas & numpy...