BEGINNER • Python Data Foundation
Data Pipeline Drill for personalized recommendation prototype #6
This lesson focuses on improve feature reliability using a practical personalized recommendation prototype scenario. You will apply commands: df.describe() | plt.plot() | pip install pandas numpy matplotlib seaborn. The code example demonstrates a concrete workflow aligned with this lesson objective, not generic filler.
Code Example
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
import pandas as pd
df = pd.DataFrame({
"x1": [0.1, 0.2, 0.35, 0.5, 0.7, 0.9, 1.0, 1.2],
"x2": [1, 0, 1, 0, 1, 1, 0, 1],
"y": [0, 0, 0, 1, 1, 1, 1, 1],
})
X_train, X_test, y_train, y_test = train_test_split(df[["x1", "x2"]], df["y"], test_size=0.25, random_state=42)
model = LogisticRegression().fit(X_train, y_train)
pred = model.predict_proba(X_test)[:, 1]
print("run:", "plt.plot()")
print("auc:", round(roc_auc_score(y_test, pred), 4))Commands & References
- df.describe()
- plt.plot()
- pip install pandas numpy matplotlib seaborn
Lab Steps
- Prepare environment using: df.describe()
- Load a small sample dataset and validate schema.
- Run the core code workflow and collect metrics.
- Compare results and write one improvement note.
Exercises
- Change one hyperparameter and compare impact.
- Add one validation rule to reduce bad inputs.
- Document one failure mode and mitigation.