BEGINNER • Data Exploration Basics
Data Pipeline Drill for health triage support model #1
This lesson focuses on increase reproducibility using a practical health triage support model scenario. You will apply commands: python -m venv .venv | jupyter lab | df.head(). The code example demonstrates a concrete workflow aligned with this lesson objective, not generic filler.
Code Example
import pandas as pd
events = pd.DataFrame({
"model_version": ["v1", "v1", "v2", "v2", "v2", "v3"],
"latency_ms": [110, 130, 95, 102, 99, 140],
"is_error": [0, 0, 0, 1, 0, 1],
})
def monitoring_report(frame: pd.DataFrame):
grouped = frame.groupby("model_version").agg(
avg_latency=("latency_ms", "mean"),
error_rate=("is_error", "mean"),
calls=("model_version", "count"),
)
return grouped.reset_index().to_dict(orient="records")
print("scenario:", "health triage support model")
print(monitoring_report(events))Commands & References
- python -m venv .venv
- jupyter lab
- df.head()
Lab Steps
- Prepare environment using: python -m venv .venv
- Load a small sample dataset and validate schema.
- Run the core code workflow and collect metrics.
- Compare results and write one improvement note.
Exercises
- Change one hyperparameter and compare impact.
- Add one validation rule to reduce bad inputs.
- Document one failure mode and mitigation.