BEGINNER • Python Data Foundation
Feature Quality Checkpoint #13
This lesson focuses on prevent data leakage using a practical warehouse optimization model scenario. You will apply commands: df.head() | df.info() | python -m venv .venv. The code example demonstrates a concrete workflow aligned with this lesson objective, not generic filler.
Code Example
import pandas as pd
events = pd.DataFrame({
"model_version": ["v1", "v1", "v2", "v2", "v2", "v3"],
"latency_ms": [110, 130, 95, 102, 99, 140],
"is_error": [0, 0, 0, 1, 0, 1],
})
def monitoring_report(frame: pd.DataFrame):
grouped = frame.groupby("model_version").agg(
avg_latency=("latency_ms", "mean"),
error_rate=("is_error", "mean"),
calls=("model_version", "count"),
)
return grouped.reset_index().to_dict(orient="records")
print("scenario:", "warehouse optimization model")
print(monitoring_report(events))Commands & References
- df.head()
- df.info()
- python -m venv .venv
Lab Steps
- Prepare environment using: df.head()
- Load a small sample dataset and validate schema.
- Run the core code workflow and collect metrics.
- Compare results and write one improvement note.
Exercises
- Change one hyperparameter and compare impact.
- Add one validation rule to reduce bad inputs.
- Document one failure mode and mitigation.