Post

Top 20 ML Algorithms

Taxonomy of essential machine learning algorithms—covering the algorithms that power production systems across industry.

Top 20 ML Algorithms

Taxonomy of essential machine learning algorithms—covering the algorithms that power production systems across industry.

Algorithm Landscape

Category Algorithm Time Space When to Use Production Use
Linear Models Linear Regression O(n×d×iter) O(d) Baseline for regression Uber: delivery time prediction
  Logistic Regression O(n×d×iter) O(d) Binary classification baseline Stripe: fraud detection
  Ridge/Lasso O(n×d×iter) O(d) Regularization, feature selection Medical diagnosis with many features
Tree-Based Decision Trees O(n log n × d) O(n) Interpretability, small data Credit approval (regulatory)
  Random Forest O(trees×n log n×d) O(n) Balanced accuracy/speed Airbnb: price prediction, booking
  Gradient Boosting (XGBoost) O(trees×n log n×d) O(n) Kaggle winner, tabular data Criteo: click-through rate prediction
  LightGBM O(trees×n log n×d) O(n) Large datasets, speed Microsoft: ranking in search
Distance-Based k-Means Clustering O(n×k×iter) O(n) Fast clustering, scalable Spotify: playlist generation
  k-Nearest Neighbors O(n×d) per query O(n×d) Non-parametric, small data Recommendation (but slow for large scale)
  SVM O(n² to n³) O(sv) High-dim data, kernel tricks Text classification, face recognition
Ensemble Bagging O(trees×n×d) O(n) Reduce variance Combines many weak learners
  Boosting O(trees×n×d) O(n) Reduce bias, weighted samples AdaBoost, Gradient Boosting
  Stacking O(models×n×d) O(n) Combine diverse models Kaggle: meta-learner
Clustering Hierarchical Clustering O(n²) O(n²) Dendrograms, interpretable Gene expression analysis
  DBSCAN O(n log n) O(n) Non-convex clusters, outliers Geospatial: location clustering
Neural Nets MLP O(layers×n×hidden) O(params) Complex non-linear patterns Recommendation systems
  CNN O(filters×n×field) O(params) Images, computer vision ResNet: ImageNet 76% top-1 accuracy
  RNN/LSTM O(seq_len×n×hidden) O(seq_len) Sequential data, NLP Google Translate, sentiment analysis
  Transformer O(seq_len²×n) O(seq_len²) Parallel processing, long-range deps BERT, GPT-4, modern NLP

Linear Models (Interpretable Baselines)

Linear Regression

Used when: Continuous output, linear relationship, interpretability matters

1
2
3
4
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Production: Uber predicts delivery time. Linear combination of distance, time-of-day, weather, traffic. ~90% R² on 10M daily predictions.


Tree-Based Models (Tabular Data Champions)

Random Forest

Used when: Balanced accuracy/interpretability/speed, non-linear relationships

1
2
3
4
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, max_depth=15, n_jobs=-1)
model.fit(X_train, y_train)
importances = model.feature_importances_  # Feature importance for interpretation

Key Properties: Trains 100 trees in parallel. Each tree uses random feature subset. Averaging reduces variance.

Production: Airbnb uses random forest for price prediction. Inputs: location, amenities, reviews. Accuracy: ±25% on 90% of listings.

Gradient Boosting (XGBoost)

Used when: State-of-the-art tabular data, Kaggle competitions, structured features

1
2
3
4
import xgboost as xgb
model = xgb.XGBClassifier(n_estimators=100, max_depth=5, learning_rate=0.1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

How It Works: Sequentially builds trees. Each new tree predicts residuals of previous trees. Weighted emphasis on hard examples.

Production: Criteo: click-through rate (CTR) prediction for online ads. 1B+ daily predictions. XGBoost achieves 0.5% AUC improvement over logistic regression = millions in ad revenue.


Distance-Based Methods

k-Nearest Neighbors (k-NN)

Used when: Small datasets, non-parametric, simple baseline

1
2
3
4
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
# Prediction: find 5 nearest neighbors, vote

Trade-offs: No training time, but O(n×d) prediction cost. Doesn’t scale to 1B examples.

Support Vector Machine (SVM)

Used when: High-dimensional data, clear margin separation

1
2
3
from sklearn.svm import SVC
model = SVC(kernel='rbf', C=1.0, gamma='scale')
model.fit(X_train, y_train)

Key Insight: Finds maximum-margin hyperplane. Kernel trick enables non-linear boundaries in original feature space.

Production: Face recognition (Facebook DeepFace started with SVM before switching to deep learning).


Clustering Algorithms

K-Means

Used when: Fast clustering, spherical clusters, scalable

1
2
3
from sklearn.cluster import KMeans
model = KMeans(n_clusters=5, n_init=10, max_iter=300)
labels = model.fit_predict(X)

Production: Spotify clusters songs by audio features (tempo, pitch, energy). 100M songs grouped into playlists + similar track recommendations.

DBSCAN

Used when: Non-convex clusters, outlier detection, unknown number of clusters

1
2
3
4
from sklearn.cluster import DBSCAN
model = DBSCAN(eps=0.5, min_samples=5)
labels = model.fit_predict(X)
# Returns -1 for noise/outliers

Production: Geospatial: cluster user locations for local restaurant recommendations.


Ensemble Methods

Bagging (Bootstrap Aggregating)

Trains multiple models on random subsamples. Averaging predictions reduces variance.

Example: Random Forest is bagging of decision trees

Boosting

Sequentially trains weak learners. Each focuses on previous mistakes. Reduces bias.

Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM

Production: Most Kaggle winners use gradient boosting (XGBoost, LightGBM, or CatBoost).

Stacking

Trains multiple diverse models. Meta-learner combines predictions.

1
2
3
4
Layer 1: Train logistic regression, random forest, SVM on X
         Output: 3×n predictions matrix
Layer 2: Train meta-learner (e.g., linear regression) on Layer 1 output
         Output: Final predictions

Neural Network Models

Multilayer Perceptron (MLP)

Used when: Complex non-linear patterns, moderate data (10K+ examples)

1
2
3
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000, alpha=0.0001)
model.fit(X_train, y_train)

Layers: Input → hidden layers (learn features) → output (task)

Convolutional Neural Network (CNN)

Used when: Images, video, spatial data

Key Innovation: Convolutional filters extract local features (edges, textures). Parameter sharing reduces parameters by 10,000x vs fully connected.

Production: ResNet-50 achieves 76% top-1 accuracy on ImageNet (1M images, 1000 categories). Inference: ~100ms on CPU.

Recurrent Neural Network (RNN) / LSTM

Used when: Sequential data (text, time-series, speech)

Key: Hidden state carries memory. Can model long-range dependencies.

Production: Google Translate uses LSTM encoder-decoder. Translates 100+ language pairs. 90%+ accuracy on common language pairs.

Transformer

Used when: NLP, long sequences, parallel training needed

Key Innovation: Self-attention replaces recurrence. Can attend to any position directly (no sequential bottleneck).

Production: BERT, GPT-4, Claude all use Transformers. BERT pre-trained on 3.3B words. Fine-tuning achieves 90%+ accuracy on 10+ NLP benchmarks with minimal task-specific data.


Choosing an Algorithm

Decision Framework:

  1. Is your output continuous or categorical?
    • Continuous → Regression (linear, tree-based, neural net)
    • Categorical → Classification (logistic reg, tree-based, SVM, neural net)
  2. Do you have labeled data?
    • Yes → Supervised learning (algorithms above)
    • No → Unsupervised (K-means, hierarchical, DBSCAN, autoencoders)
  3. What’s your data size?
    • <1000: Simple models (logistic reg, SVM, small tree) or transfer learning
    • 1K–1M: Random Forest, XGBoost (best power-to-effort ratio)
    • 1M: Neural nets, gradient boosting with sampling

  4. Do you need interpretability?
    • Yes → Linear regression, decision trees, random forest (feature importance)
    • No → Neural nets, SVM with complex kernels
  5. What’s your computational budget?
    • Fast (<1s training): Logistic regression, small trees
    • Minutes: Random Forest, XGBoost
    • Hours–days: Neural networks, transfer learning

References

📖 Hands-On Machine Learning (Aurélien Géron) — Practical, industry-focused textbook 📄 XGBoost Paper (Chen & Guestrin, 2016) 📄 LightGBM Paper (Ke et al., 2017) 📖 Scikit-learn Documentation 🎥 Andrew Ng: Machine Learning (Coursera) — Industry standard intro course 🔗 Kaggle Learn — Free practical courses

This post is licensed under CC BY 4.0 by the author.