BEGINNER • Data Exploration Basics

Data Pipeline Drill for health triage support model #1

This lesson focuses on increase reproducibility using a practical health triage support model scenario. You will apply commands: python -m venv .venv | jupyter lab | df.head(). The code example demonstrates a concrete workflow aligned with this lesson objective, not generic filler.

Code Example

import pandas as pd

events = pd.DataFrame({
  "model_version": ["v1", "v1", "v2", "v2", "v2", "v3"],
  "latency_ms": [110, 130, 95, 102, 99, 140],
  "is_error": [0, 0, 0, 1, 0, 1],
})

def monitoring_report(frame: pd.DataFrame):
  grouped = frame.groupby("model_version").agg(
    avg_latency=("latency_ms", "mean"),
    error_rate=("is_error", "mean"),
    calls=("model_version", "count"),
  )
  return grouped.reset_index().to_dict(orient="records")

print("scenario:", "health triage support model")
print(monitoring_report(events))

Commands & References

python -m venv .venv
jupyter lab
df.head()

Lab Steps

Prepare environment using: python -m venv .venv
Load a small sample dataset and validate schema.
Run the core code workflow and collect metrics.
Compare results and write one improvement note.

Exercises

Change one hyperparameter and compare impact.
Add one validation rule to reduce bad inputs.
Document one failure mode and mitigation.

Previous Lesson Next Lesson

Back to Data Science & AI