Top 20 ML Algorithms

Taxonomy of essential machine learning algorithms—covering the algorithms that power production systems across industry.

Posted Feb 1, 2025

5 min read

Taxonomy of essential machine learning algorithms—covering the algorithms that power production systems across industry.

Algorithm Landscape

Category	Algorithm	Time	Space	When to Use	Production Use
Linear Models	Linear Regression	O(n×d×iter)	O(d)	Baseline for regression	Uber: delivery time prediction
	Logistic Regression	O(n×d×iter)	O(d)	Binary classification baseline	Stripe: fraud detection
	Ridge/Lasso	O(n×d×iter)	O(d)	Regularization, feature selection	Medical diagnosis with many features
Tree-Based	Decision Trees	O(n log n × d)	O(n)	Interpretability, small data	Credit approval (regulatory)
	Random Forest	O(trees×n log n×d)	O(n)	Balanced accuracy/speed	Airbnb: price prediction, booking
	Gradient Boosting (XGBoost)	O(trees×n log n×d)	O(n)	Kaggle winner, tabular data	Criteo: click-through rate prediction
	LightGBM	O(trees×n log n×d)	O(n)	Large datasets, speed	Microsoft: ranking in search
Distance-Based	k-Means Clustering	O(n×k×iter)	O(n)	Fast clustering, scalable	Spotify: playlist generation
	k-Nearest Neighbors	O(n×d) per query	O(n×d)	Non-parametric, small data	Recommendation (but slow for large scale)
	SVM	O(n² to n³)	O(sv)	High-dim data, kernel tricks	Text classification, face recognition
Ensemble	Bagging	O(trees×n×d)	O(n)	Reduce variance	Combines many weak learners
	Boosting	O(trees×n×d)	O(n)	Reduce bias, weighted samples	AdaBoost, Gradient Boosting
	Stacking	O(models×n×d)	O(n)	Combine diverse models	Kaggle: meta-learner
Clustering	Hierarchical Clustering	O(n²)	O(n²)	Dendrograms, interpretable	Gene expression analysis
	DBSCAN	O(n log n)	O(n)	Non-convex clusters, outliers	Geospatial: location clustering
Neural Nets	MLP	O(layers×n×hidden)	O(params)	Complex non-linear patterns	Recommendation systems
	CNN	O(filters×n×field)	O(params)	Images, computer vision	ResNet: ImageNet 76% top-1 accuracy
	RNN/LSTM	O(seq_len×n×hidden)	O(seq_len)	Sequential data, NLP	Google Translate, sentiment analysis
	Transformer	O(seq_len²×n)	O(seq_len²)	Parallel processing, long-range deps	BERT, GPT-4, modern NLP

Linear Models (Interpretable Baselines)

Linear Regression

Used when: Continuous output, linear relationship, interpretability matters

        
      
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Production: Uber predicts delivery time. Linear combination of distance, time-of-day, weather, traffic. ~90% R² on 10M daily predictions.

Tree-Based Models (Tabular Data Champions)

Random Forest

Used when: Balanced accuracy/interpretability/speed, non-linear relationships

        
      
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, max_depth=15, n_jobs=-1)
model.fit(X_train, y_train)
importances = model.feature_importances_  # Feature importance for interpretation

Key Properties: Trains 100 trees in parallel. Each tree uses random feature subset. Averaging reduces variance.

Production: Airbnb uses random forest for price prediction. Inputs: location, amenities, reviews. Accuracy: ±25% on 90% of listings.

Gradient Boosting (XGBoost)

Used when: State-of-the-art tabular data, Kaggle competitions, structured features

        
      
import xgboost as xgb
model = xgb.XGBClassifier(n_estimators=100, max_depth=5, learning_rate=0.1)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

How It Works: Sequentially builds trees. Each new tree predicts residuals of previous trees. Weighted emphasis on hard examples.

Production: Criteo: click-through rate (CTR) prediction for online ads. 1B+ daily predictions. XGBoost achieves 0.5% AUC improvement over logistic regression = millions in ad revenue.

Distance-Based Methods

k-Nearest Neighbors (k-NN)

Used when: Small datasets, non-parametric, simple baseline

        
      
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
# Prediction: find 5 nearest neighbors, vote

Trade-offs: No training time, but O(n×d) prediction cost. Doesn’t scale to 1B examples.

Support Vector Machine (SVM)

Used when: High-dimensional data, clear margin separation

        
      
from sklearn.svm import SVC
model = SVC(kernel='rbf', C=1.0, gamma='scale')
model.fit(X_train, y_train)

Key Insight: Finds maximum-margin hyperplane. Kernel trick enables non-linear boundaries in original feature space.

Production: Face recognition (Facebook DeepFace started with SVM before switching to deep learning).

Clustering Algorithms

K-Means

Used when: Fast clustering, spherical clusters, scalable

        
      
from sklearn.cluster import KMeans
model = KMeans(n_clusters=5, n_init=10, max_iter=300)
labels = model.fit_predict(X)

Production: Spotify clusters songs by audio features (tempo, pitch, energy). 100M songs grouped into playlists + similar track recommendations.

DBSCAN

Used when: Non-convex clusters, outlier detection, unknown number of clusters

        
      
from sklearn.cluster import DBSCAN
model = DBSCAN(eps=0.5, min_samples=5)
labels = model.fit_predict(X)
# Returns -1 for noise/outliers

Production: Geospatial: cluster user locations for local restaurant recommendations.

Ensemble Methods

Bagging (Bootstrap Aggregating)

Trains multiple models on random subsamples. Averaging predictions reduces variance.

Example: Random Forest is bagging of decision trees

Boosting

Sequentially trains weak learners. Each focuses on previous mistakes. Reduces bias.

Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM

Production: Most Kaggle winners use gradient boosting (XGBoost, LightGBM, or CatBoost).

Stacking

Trains multiple diverse models. Meta-learner combines predictions.

Layer 1: Train logistic regression, random forest, SVM on X
         Output: 3×n predictions matrix
Layer 2: Train meta-learner (e.g., linear regression) on Layer 1 output
         Output: Final predictions

Neural Network Models

Multilayer Perceptron (MLP)

Used when: Complex non-linear patterns, moderate data (10K+ examples)

        
      
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000, alpha=0.0001)
model.fit(X_train, y_train)

Layers: Input → hidden layers (learn features) → output (task)

Convolutional Neural Network (CNN)

Used when: Images, video, spatial data

Key Innovation: Convolutional filters extract local features (edges, textures). Parameter sharing reduces parameters by 10,000x vs fully connected.

Production: ResNet-50 achieves 76% top-1 accuracy on ImageNet (1M images, 1000 categories). Inference: ~100ms on CPU.

Recurrent Neural Network (RNN) / LSTM

Used when: Sequential data (text, time-series, speech)

Key: Hidden state carries memory. Can model long-range dependencies.

Production: Google Translate uses LSTM encoder-decoder. Translates 100+ language pairs. 90%+ accuracy on common language pairs.

Transformer

Used when: NLP, long sequences, parallel training needed

Key Innovation: Self-attention replaces recurrence. Can attend to any position directly (no sequential bottleneck).

Production: BERT, GPT-4, Claude all use Transformers. BERT pre-trained on 3.3B words. Fine-tuning achieves 90%+ accuracy on 10+ NLP benchmarks with minimal task-specific data.

Choosing an Algorithm

Decision Framework:

Is your output continuous or categorical?
- Continuous → Regression (linear, tree-based, neural net)
- Categorical → Classification (logistic reg, tree-based, SVM, neural net)
Do you have labeled data?
- Yes → Supervised learning (algorithms above)
- No → Unsupervised (K-means, hierarchical, DBSCAN, autoencoders)
What’s your data size?
- <1000: Simple models (logistic reg, SVM, small tree) or transfer learning
- 1K–1M: Random Forest, XGBoost (best power-to-effort ratio)
- 1M: Neural nets, gradient boosting with sampling
Do you need interpretability?
- Yes → Linear regression, decision trees, random forest (feature importance)
- No → Neural nets, SVM with complex kernels
What’s your computational budget?
- Fast (<1s training): Logistic regression, small trees
- Minutes: Random Forest, XGBoost
- Hours–days: Neural networks, transfer learning

References

📖 Hands-On Machine Learning (Aurélien Géron) — Practical, industry-focused textbook 📄 XGBoost Paper (Chen & Guestrin, 2016) 📄 LightGBM Paper (Ke et al., 2017) 📖 Scikit-learn Documentation 🎥 Andrew Ng: Machine Learning (Coursera) — Industry standard intro course 🔗 Kaggle Learn — Free practical courses

AI & Agents, ML Foundations

ai-fundamentals

This post is licensed under CC BY 4.0 by the author.