Machine Learning Models & Learning Paradigms

Choosing the right learning paradigm matters more than choosing a specific algorithm. The paradigm determines what type of data you need and what you can learn.

Posted Jan 18, 2025

7 min read

Key insight: Choosing the right learning paradigm matters more than choosing a specific algorithm. The paradigm determines what type of data you need and what you can learn.

Learning Paradigms at a Glance

Paradigm	Data	Problem Type	Best For	Example
Supervised	Labeled (X, y)	Prediction	Regression, classification	Predicting house prices, email spam
Unsupervised	Unlabeled (X only)	Discovery	Clustering, dimensionality reduction	Customer segmentation, anomaly detection
Reinforcement	Reward signals	Sequential decision-making	Control, games	Self-driving cars, game AI (AlphaGo)
Semi-supervised	Mostly unlabeled + some labeled	When labeling is expensive	Low-data scenarios	Medical imaging (few labeled scans)
Transfer	Pre-trained model	Leverage existing knowledge	Accelerate training	Fine-tuning BERT on custom task

1. Supervised Learning

Definition: Learn from labeled examples (X, y pairs) where y is the ground truth. Goal: predict y for new, unseen X.

Classification (Predict Categories)

Definition: Predict which discrete category an instance belongs to.

Common Algorithms:

Algorithm	Time (Training)	Space	Best For	Interpretability	Scalability
Logistic Regression	O(n×d×iterations)	O(d)	Binary classification, baseline	Very high	Excellent
Decision Trees	O(n log n × d)	O(n)	Interpretable rules, small data	Perfect	Poor
Random Forest	O(trees × n log n × d)	O(trees × n)	Balanced performance, non-linear	Medium	Good
Gradient Boosting (XGBoost)	O(trees × n log n × d)	O(trees × n)	Tabular data, Kaggle competitions	Low	Good
SVM	O(n² to n³)	O(support vectors)	High-dimensional, small-medium data	Low	Medium
Naive Bayes	O(n×d)	O(classes × d)	Text classification, spam filtering	High	Excellent
k-NN	O(n×d) for each prediction	O(n×d)	Small datasets, non-parametric	Perfect	Poor
Neural Networks (MLP)	O(n × layers × hidden_size)	O(n_params)	Complex non-linear patterns	Very low	Excellent

Real-World Examples:

LinkedIn: Predicts job recommendations for millions of users. Uses ensemble of models (gradient boosting + neural networks).
Stripe: Detects fraudulent transactions in real-time. Logistic regression for baseline, gradient boosting for production. <50ms latency required.
Zillow: Predicts house values (Zestimate). Gradient boosting over 100+ features. 95% of predictions within 20% of actual price.

Regression (Predict Continuous Values)

Definition: Predict a continuous numerical output (not discrete categories).

Common Algorithms:

Algorithm	Use Case	Accuracy	Speed
Linear Regression	Simple relationships	Baseline	Fast
Polynomial Regression	Non-linear but smooth	Better	Medium
Ridge/Lasso	Prevent overfitting	Better	Fast
Decision Tree Regression	Non-linear, interpretable	Good	Medium
Random Forest Regression	Robust, non-linear	Very good	Good
Gradient Boosting Regression	State-of-the-art tabular	Excellent	Good
Neural Networks	Complex, high-dimensional	Excellent	Depends on size

Production Examples:

Uber: Predicts delivery time based on traffic, weather, driver location. Gradient boosting. Updates every 5 minutes.
Amazon: Demand forecasting—predicts inventory needs. ARIMA + LSTM hybrid for seasonality + trends.
Tesla: Predicts remaining battery range. Uses vehicle telemetry + weather + route. Critical for UX.

2. Unsupervised Learning

Definition: Learn from unlabeled data—discover hidden structure or patterns without ground truth.

Clustering

Definition: Group similar instances together. No predefined labels.

Algorithm	Time	Space	Clusters	Best For
K-Means	O(n×k×iterations)	O(n+k)	K (you choose)	Fast, scalable, convex clusters
Hierarchical	O(n²)	O(n²)	Any	Dendrograms, visualizable, small data
DBSCAN	O(n log n) with spatial index	O(n)	Auto-detected	Non-convex shapes, outlier detection
Gaussian Mixture Models	O(n×k×iterations)	O(n+k×d)	Probabilistic	Soft assignments, probabilistic framework

Real-World Applications:

Netflix: Customer segmentation. Clusters users by watch behavior. Personalizes recommendations per segment.
Spotify: Song clustering by audio features. Groups similar songs for playlist generation and recommendation.
E-commerce: Market segmentation. Groups customers by purchase history, demographics. Enables targeted marketing.
Genomics: Gene expression clustering. Groups genes with similar expression patterns to discover functional relationships.

Dimensionality Reduction

Definition: Reduce the number of features while preserving important information.

Algorithms:

Algorithm	Input	Output	Use
Principal Component Analysis (PCA)	d features	k components	Reduce features for visualization, speed up training
t-SNE	d features	2–3 components	Visualization only (not for prediction)
UMAP	d features	k components	Visualization + preserves local structure
Autoencoders	d features	k dimensions (learned)	Non-linear dimensionality reduction

Production Use:

Google Images: Reduces high-dimensional image features for fast nearest-neighbor search. PCA preprocesses embeddings.
Recommendation Systems: Latent factor models reduce user/item features for efficient collaborative filtering.

Anomaly Detection

Definition: Find unusual patterns or outliers that deviate from normal behavior.

Algorithms:

Isolation Forest: Isolates anomalies in random forests. Fast, doesn’t need to model normal distribution.
Local Outlier Factor (LOF): Compares density of neighbors. Good for local anomalies.
Autoencoders: Trains on normal data. High reconstruction error = anomaly.
One-Class SVM: Learns boundary around normal data.

Examples:

Credit Card Fraud: Real-time detection of suspicious transactions. ~0.1% fraud rate. Must catch fraud while minimizing false positives.
Manufacturing: Detects equipment failures from sensor data. Predictive maintenance saves millions in downtime.
Network Security: Detects intrusions by identifying unusual traffic patterns. NSA, major banks.

3. Reinforcement Learning (RL)

Definition: Learn by interacting with an environment. Receive rewards/penalties for actions. Goal: learn a policy that maximizes cumulative reward.

Key Concepts:

Agent: The learner (e.g., robot, game AI)
Environment: World the agent acts in (e.g., game, warehouse)
State: Current situation
Action: What agent can do
Reward: Feedback signal (positive for good actions, negative for bad)
Policy: Strategy—mapping from state to action

Algorithm Families:

Type	Algorithm	How	Best For
Value-Based	Q-Learning, DQN	Learn value of each action in each state	Games with discrete actions
Policy-Based	Policy Gradient, PPO, Actor-Critic	Learn policy directly	Continuous control (robotics)
Model-Based	Monte Carlo Tree Search, AlphaGo	Learn dynamics model, plan ahead	Games with perfect information

Production Applications:

AlphaGo (DeepMind): Defeated world Go champion Lee Sedol. Combines deep learning + tree search + RL. 19×19 board = 10^170 possible states (too large for brute force).
Autonomous Vehicles (Waymo): RL agent learns to navigate traffic. Trained in simulation, deployed in real world.
Portfolio Optimization (Finance): RL learns trading strategy. Rewards = profit, penalties = risk. Outperforms rule-based strategies.
Robotics (Boston Dynamics): Learns bipedal locomotion through RL. Agent receives reward for forward progress, penalty for falling.

4. Transfer Learning

Definition: Leverage knowledge from one task to accelerate learning on a related task.

Pattern:

Pre-train on large dataset (e.g., ImageNet with 1M images)
Fine-tune on smaller, task-specific dataset (e.g., medical images with 500 labeled scans)
Result: High performance with less data and faster training

Benefits:

✅ Reduces labeled data requirement by 10–100x
✅ Speeds up training from weeks to hours
✅ Improves performance on small datasets
✅ Enables few-shot and zero-shot learning

Real-World Examples:

BERT (Google): Pre-trained on 3.3B words. Fine-tuning on specific NLP task (sentiment analysis, NER, QA) improves accuracy by 5–15% with 1000s of labels instead of millions.
ResNet (Computer Vision): Pre-trained on ImageNet (1.2M labeled images, 1000 categories). Fine-tuning on medical imaging task (X-rays: 10K images) achieves 95%+ accuracy vs 75% training from scratch.
GPT-4 Fine-tuning (OpenAI): Pre-trained on trillions of tokens. Fine-tuning on customer support corpus (50K examples) creates domain-specific chatbot in days, not months.

5. Semi-Supervised Learning

Definition: Use mostly unlabeled data + small amount of labeled data. Unlabeled data helps improve performance.

Techniques:

Self-training: Train on labeled data, then use predictions on unlabeled data as pseudo-labels
Consistency regularization: Unlabeled samples should produce consistent predictions under perturbations
Generative models: Learn from unlabeled data, then fine-tune with labeled data

When It Shines:

Medical imaging: Labeling X-rays is expensive ($1–10 per image). Semi-supervised can use 1000s of unlabeled images + 100s of labeled.
Natural language: Can leverage massive web text (unlabeled) + small amount of annotation (labeled).

Implementation Patterns

        
      
# Supervised Learning
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X, y = load_data()  # Features, labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)

# Unsupervised Learning
from sklearn.cluster import KMeans

X = load_unlabeled_data()
model = KMeans(n_clusters=5)
labels = model.fit_predict(X)  # No y needed

# Reinforcement Learning
import gym

env = gym.make("CartPole-v1")
for episode in range(1000):
    state = env.reset()
    for step in range(500):
        action = select_action(state)  # Learned policy
        state, reward, done, info = env.step(action)
        update_policy(state, action, reward)  # Learn from reward

References

📄 Supervised Learning Overview (scikit-learn docs) 📄 Unsupervised Learning (scikit-learn docs) 📖 Reinforcement Learning: An Introduction (Sutton & Barto) — Classic textbook 📄 Transfer Learning Survey (Zhuang et al., 2020) 🎥 Deep Reinforcement Learning (UC Berkeley CS 285) — Industry standard course 🔗 Scikit-learn Documentation — Comprehensive ML library

AI & Agents, ML Foundations

ai-fundamentals

This post is licensed under CC BY 4.0 by the author.