BEGINNER • Python Data Foundation
Modeling Sprint: improve feature reliability #22
This lesson focuses on improve feature reliability using a practical personalized recommendation prototype scenario. You will apply commands: df.describe() | plt.plot() | pip install pandas numpy matplotlib seaborn. The code example demonstrates a concrete workflow aligned with this lesson objective, not generic filler.
Code Example
import numpy as np
def clean_series(values: list[float]):
arr = np.array(values, dtype=float)
median = float(np.median(arr))
mad = float(np.median(np.abs(arr - median)))
threshold = median + 3 * max(mad, 1e-6)
filtered = arr[arr <= threshold]
return {
"median": median,
"mad": mad,
"count_before": len(arr),
"count_after": len(filtered),
}
series = [12, 13, 11, 14, 500, 12, 13, 11]
print("inspect:", "pip install pandas numpy matplotlib seaborn")
print(clean_series(series))Commands & References
- df.describe()
- plt.plot()
- pip install pandas numpy matplotlib seaborn
Lab Steps
- Prepare environment using: df.describe()
- Load a small sample dataset and validate schema.
- Run the core code workflow and collect metrics.
- Compare results and write one improvement note.
Exercises
- Change one hyperparameter and compare impact.
- Add one validation rule to reduce bad inputs.
- Document one failure mode and mitigation.