Classical ML

Classical ML remains important because many production problems are tabular, small-data, latency-sensitive, or need strong interpretability. Gradient boosted trees or logistic regression can beat deep learning when features are structured and data volume is moderate.

Command Examples

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=7)
model = LogisticRegression(max_iter=1000).fit(X_train, y_train)
print(model.score(X_test, y_test))

Example output and meaning:

Command	Example output	What it does
`Python example`	A score such as `0.97`.	Demonstrates a complete train/test split and baseline classifier score.

Model Families

Family	Strength	Watch Out For
Linear regression	Simple numeric prediction.	Nonlinear relationships and outliers.
Logistic regression	Interpretable classification baseline.	Feature scaling and class imbalance.
Decision trees	Human-readable splits.	Overfitting deep trees.
Random forests	Robust tabular baseline.	Larger memory and less direct explanations.
Gradient boosting	Strong tabular performance.	Leakage and careful validation.
SVMs	Good margins for some small datasets.	Scaling to large datasets.
k-means	Simple clustering.	Choosing k and distance metric assumptions.

Feature Engineering

Feature engineering translates raw records into model-ready signals:

numeric scaling,
categorical encoding,
time-window aggregates,
missing-value flags,
text counts or embeddings,
domain-specific ratios,
leakage removal.

Tabular Workflow

Define prediction time and label time.
Build point-in-time correct features.
Split by time, entity, or source.
Train a simple baseline.
Add features and compare by slice.
Calibrate and choose thresholds.
Monitor feature drift and label delay.

Practical Lab: Baseline Before Complexity

Baseline plan:
  model: logistic regression
  split: last 30 days as test
  metric: recall at fixed review capacity
  slices: customer tier, product, language
  challenger: gradient boosted trees

The lab target is not highest score. It is proving whether a simple model is good enough and where it fails.

Study Cards

Question

When can classical ML beat deep learning?

Answer

When data is structured, moderate in size, and feature engineering captures the main signal.

Question

Why start with a simple baseline?

Answer

It exposes data and metric problems before model complexity hides them.

Question

What is the main tabular ML leakage risk?

Answer

Features that contain future or post-outcome information unavailable at prediction time.

Classical ML

Command Examples

Model Families

Feature Engineering

Tabular Workflow

Practical Lab: Baseline Before Complexity

Study Cards

References