AI Hierarchy

Nested capability stack: AI ⊃ ML ⊃ DL ⊃ GenAI—each layer builds on the one below, enabling increasingly complex intelligence.

Posted Jan 10, 2025

8 min read

AI Hierarchy

Nested capability stack: AI ⊃ ML ⊃ DL ⊃ GenAI—each layer builds on the one below, enabling increasingly complex intelligence.

Key Properties

Level	Mechanism	Data Needed	Interpretability	When to Use
AI	Rules + learning	Varies	Often high	Complex decisions, any domain
ML	Statistical patterns	100s–1000s examples	Medium	Prediction from structured data
DL	Hierarchical representations	1000s–100Ks	Very low	Images, text, complex sequences
GenAI	Transformer + next-token prediction	100K–billions tokens	Nearly opaque	Content creation, open-ended tasks

When to Use / Avoid

✅ Use AI/Rule-Based When:

Problem logic is well-defined and deterministic
Business rules are stable and easy to codify
Interpretability is critical (healthcare, finance)
Data is sparse or expensive to label

❌ Avoid When:

Patterns are too complex to express manually
Rules change frequently with data distribution
Transparency is less critical than accuracy

✅ Use ML When:

You have labeled historical data (100s–1000s examples)
Clear input→output mapping exists
Problem is stable and static
Interpretability helps with adoption

❌ Avoid When:

You have high-dimensional unstructured data
Only a few labeled examples exist
Patterns are hierarchical and need deep learning

✅ Use Deep Learning When:

Working with images, video, or audio
NLP tasks (language understanding, generation)
Sequential data with long-term dependencies
You have access to large labeled datasets (10K+)

❌ Avoid When:

Interpretability is critical
You have very small datasets (<1000)
Computational resources are extremely limited
Simple ML algorithms already solve the problem

✅ Use Generative AI When:

Task requires content creation or open-ended generation
You need conversational interfaces
Few-shot or zero-shot learning is possible
Speed to market matters more than perfect accuracy

❌ Avoid When:

Factual accuracy is non-negotiable (medical diagnosis)
You need strong reasoning over multiple steps
Explainability of decisions is regulatory requirement

Detailed Breakdown

1. Artificial Intelligence (AI)

Definition: Broad field of creating machines that simulate human intelligence through any technique—rules, learning, or hybrid approaches.

Scope includes:

Expert systems (rule-based knowledge)
Machine learning systems (pattern discovery)
Robotics and autonomous systems (embodied AI)
Natural language processing (language understanding)
Computer vision (visual perception)
Game-playing AI (strategic reasoning)

Examples in Production:

Chess engines (Deep Blue vs Kasparov, 1997): Rule-based + search, defeated world champion
Recommendation systems (Netflix, Spotify): Collaborative filtering (ML), drive 80% of views
Virtual assistants (Siri, Alexa): Hybrid AI—voice recognition (DL) + intent parsing (NLP) + command execution (rules)
Autonomous vehicles (Waymo, Tesla): Computer vision (DL) + path planning (rules) + decision-making (RL)

Key Insight: AI is the broadest category. Most modern AI systems combine rules, ML, and deep learning.

2. Machine Learning (ML)

Definition: Subset of AI where systems learn patterns from data without explicit programming. The algorithm discovers rules automatically.

Core Philosophy:

Traditional Programming: Rules + Data → Program → Output Machine Learning: Data + Desired Output → Algorithm → Rules (learned)

Learning Paradigms:

Paradigm	Data Type	How It Works	Best For	Example
Supervised	Labeled (X, y pairs)	Learn mapping from features to target	Prediction, classification	Spam detection, fraud detection
Unsupervised	Unlabeled (X only)	Find hidden patterns/structure	Discovery, segmentation	Customer segmentation, anomaly detection
Reinforcement	Reward signals	Learn policy via trial-and-error	Sequential decision-making	Game AI, robotics, trading
Semi-supervised	Mixed labeled/unlabeled	Leverage unlabeled data to improve accuracy	When labeling is expensive	Medical image analysis
Transfer	Pre-trained model	Reuse knowledge from one task on another	Low-data scenarios	Fine-tuning BERT for domain tasks

Production ML Use Cases:

Churn Prediction (SaaS): Binary classification—predict which customers will cancel. Netflix uses this to trigger retention offers.
Fraud Detection (Finance): Real-time anomaly detection. Detects unusual transaction patterns. Reduces fraud by 40–60%.
Recommendation Systems: Collaborative filtering or content-based. Amazon’s recommendations drive 35% of revenue.
Demand Forecasting (Retail): Time-series regression to predict inventory needs, optimize stock, minimize waste.

Key Challenge: ML requires good labeled data. Garbage in, garbage out.

3. Deep Learning (DL)

Definition: Subset of ML using multi-layered artificial neural networks to learn hierarchical representations from data. “Deep” refers to the number of layers.

Why Deep Learning Works:

Simple models can only learn simple boundaries. A network with 1 hidden layer can approximate any function, but needs exponentially more neurons. Deep networks learn hierarchies:

Layer 1: Low-level features (edges, textures)
Layer 2: Mid-level features (shapes, objects)
Layer 3+: High-level semantic concepts
Final layer: Task-specific output (classification, generation)

Example (Image Classification):

Input: Pixel values
Layer 1: Detects edges and corners
Layer 2: Detects textures and simple shapes
Layer 3: Detects object parts (eyes, ears, fur)
Layer 4: Detects objects (cat, dog, person)
Output: Classification

Common Architectures:

Architecture	Domain	Best For	Example Systems
CNN (Convolutional)	Computer vision	Image classification, object detection	ResNet, YOLOv8, CLIP
RNN/LSTM	Sequential data	Language modeling, time-series	NVIDIA’s stock price predictor, early EKG anomaly detection
Transformer	NLP + sequences	Language understanding, generation, translation	BERT, GPT-4, T5 (Google Translate)
GAN (Generative Adversarial)	Image generation	Synthetic data, image-to-image	StyleGAN (face generation), pix2pix (sketch→photo)
Autoencoder	Unsupervised learning	Dimensionality reduction, anomaly detection	Netflix: compress user embeddings for recommendation

Strengths: ✅ Learns hierarchical features automatically—no manual feature engineering needed ✅ State-of-the-art on complex tasks (ImageNet, COCO, SQuAD benchmarks) ✅ Scales with data—performance improves with billions of examples ✅ Transfer learning—pre-trained models accelerate new applications

Weaknesses: ❌ Needs massive labeled datasets (10K–1M+ examples) ❌ Computationally expensive (weeks on GPUs for large models) ❌ Black-box interpretability—hard to explain why model made a decision ❌ Prone to adversarial attacks and dataset bias

Production Scale Numbers:

GPT-4: ~175 billion parameters, trained on ~1 trillion tokens, costs ~$100M
ResNet-50 (ImageNet): 25.5M parameters, achieves 76% top-1 accuracy, inference: ~100ms on CPU
BERT (Google): 12 layers, 110M parameters, pre-trained on 3.3B words

4. Generative AI (GenAI)

Definition: Specialized area of DL focused on generating new content (text, images, code, video) based on learned patterns, rather than classifying or predicting existing data.

Discriminative vs Generative:

Discriminative: Learns decision boundary. “What is this image?” (Input → Classification)
Generative: Learns data distribution. “Create an image of a cat.” (Prompt → New Content)

Core Models:

Model Type	Generates	Mechanism	Example Systems
LLM	Human-like text	Transformer + next-token prediction	GPT-4, Claude, LLaMA, Gemini
Diffusion	Images from noise	Iterative denoising	Stable Diffusion, DALL-E 3, Midjourney
GAN	Images, synthetic data	Generator vs discriminator adversarial loss	StyleGAN, CycleGAN
Seq2Seq	Any sequence output	Encoder-decoder with attention	Google Translate, summarization, speech synthesis

Key Capabilities (as of 2026):

Generate human-like text for any domain (news, creative writing, technical documentation)
Create realistic images from text descriptions with artistic control
Write production-quality code with explanation
Translate between 100+ languages with 90%+ accuracy
Summarize documents while preserving key insights
Answer complex questions conversationally with multi-step reasoning (Chain-of-Thought)
Compose music with consistent style and structure

Production Systems Using GenAI:

ChatGPT (OpenAI): 100M+ weekly users. Text generation. Cost: ~$0.002/1K tokens.
GitHub Copilot: 4M+ developers using AI for code completion. 40% of code at some companies generated by Copilot.
Google Search (Generative): AI Overviews generate summaries. 95% of searches still use traditional links, but GenAI is reshaping discovery.
Midjourney (Image Generation): $10–120/month subscription. 12M+ images generated monthly as of 2024.
Customer Support Bots (Enterprise): Reduce support tickets by 30–40%. Escalate complex issues to humans.

Typical GenAI Pipeline (LLM):

Prompt: "Write a Python function to sort a list"
   ↓
Tokenize: ["Write", "a", "Python", "function", ...]
   ↓
Embedding: Convert tokens to semantic vectors (d=4096)
   ↓
Transformer Encoder: Understand context (self-attention)
   ↓
Transformer Decoder: Generate next token (autoregressive)
   ↓
Temperature/Sampling: Control randomness (greedy, top-k, nucleus)
   ↓
Output: "def sort_list(arr):\n    return sorted(arr)"

Challenges:

Hallucination: Confident false statements (e.g., “Claude was founded in 2015”)
Latency: LLMs require 50–500ms per request at scale
Cost: GPT-4 is ~20x more expensive than GPT-3.5
Safety: Risk of generating harmful, biased, or private data

Capability Progression

                    Complexity & Capability
                              ▲
                              │
Complex, Creative      GenAI ├─╪ Generate text, images, code
Content Generation            │╲  Models: LLMs, Diffusion
                              │ ╲ Cost: High, Latency: High
                              │  ╲
Hierarchical Patterns  DL     ├──╪ Vision, NLP, sequences
from Complex Data             │  ╲ Models: CNN, RNN, Transformer
                              │   ╲ Needs: 10K+ examples
                              │    ╲
Structured Patterns    ML     ├────╪ Classification, prediction
from Tabular Data             │    ╲ Models: XGBoost, SVM, RF
                              │     ╲ Needs: 100–1000 examples
                              │      ╲
Simple Rules/Heuristics       ├──────╪ Deterministic logic
                              │      ╲ Cost: Low, Latency: <1ms
                              │_______╲
                              └────────┴──────────────────→ Ease of Implementation

Decision Framework

Ask yourself:

Is the problem deterministic? → Use rules/ML
Do you have labeled data? → Use ML
Is the data high-dimensional (images, text)? → Use DL
Do you need to generate new content? → Use GenAI
Do you need real-time predictions? → Avoid large LLMs (use smaller models or cached results)

References

📖 Artificial Intelligence on Wikipedia 📖 Machine Learning on Wikipedia 📖 Deep Learning (Goodfellow, Bengio, Courville) — The authoritative textbook 📄 Attention Is All You Need (Vaswani et al., 2017) — Introduced Transformers, foundation of modern GenAI 🎥 3Blue1Brown: Neural Networks — Excellent visual intuition 🎥 Stanford CS224N: NLP with Deep Learning — Industry-standard NLP course

AI & Agents, ML Foundations

ai-fundamentals

This post is licensed under CC BY 4.0 by the author.

Key Properties

When to Use / Avoid

Detailed Breakdown

1. Artificial Intelligence (AI)

2. Machine Learning (ML)

3. Deep Learning (DL)

4. Generative AI (GenAI)

Capability Progression

Decision Framework

References

Trending Tags