Post

AI Hierarchy

Nested capability stack: AI ⊃ ML ⊃ DL ⊃ GenAI—each layer builds on the one below, enabling increasingly complex intelligence.

AI Hierarchy

Nested capability stack: AI ⊃ ML ⊃ DL ⊃ GenAI—each layer builds on the one below, enabling increasingly complex intelligence.

Key Properties

Level Mechanism Data Needed Interpretability When to Use
AI Rules + learning Varies Often high Complex decisions, any domain
ML Statistical patterns 100s–1000s examples Medium Prediction from structured data
DL Hierarchical representations 1000s–100Ks Very low Images, text, complex sequences
GenAI Transformer + next-token prediction 100K–billions tokens Nearly opaque Content creation, open-ended tasks

When to Use / Avoid

✅ Use AI/Rule-Based When:

  • Problem logic is well-defined and deterministic
  • Business rules are stable and easy to codify
  • Interpretability is critical (healthcare, finance)
  • Data is sparse or expensive to label

❌ Avoid When:

  • Patterns are too complex to express manually
  • Rules change frequently with data distribution
  • Transparency is less critical than accuracy

✅ Use ML When:

  • You have labeled historical data (100s–1000s examples)
  • Clear input→output mapping exists
  • Problem is stable and static
  • Interpretability helps with adoption

❌ Avoid When:

  • You have high-dimensional unstructured data
  • Only a few labeled examples exist
  • Patterns are hierarchical and need deep learning

✅ Use Deep Learning When:

  • Working with images, video, or audio
  • NLP tasks (language understanding, generation)
  • Sequential data with long-term dependencies
  • You have access to large labeled datasets (10K+)

❌ Avoid When:

  • Interpretability is critical
  • You have very small datasets (<1000)
  • Computational resources are extremely limited
  • Simple ML algorithms already solve the problem

✅ Use Generative AI When:

  • Task requires content creation or open-ended generation
  • You need conversational interfaces
  • Few-shot or zero-shot learning is possible
  • Speed to market matters more than perfect accuracy

❌ Avoid When:

  • Factual accuracy is non-negotiable (medical diagnosis)
  • You need strong reasoning over multiple steps
  • Explainability of decisions is regulatory requirement

Detailed Breakdown

1. Artificial Intelligence (AI)

Definition: Broad field of creating machines that simulate human intelligence through any technique—rules, learning, or hybrid approaches.

Scope includes:

  • Expert systems (rule-based knowledge)
  • Machine learning systems (pattern discovery)
  • Robotics and autonomous systems (embodied AI)
  • Natural language processing (language understanding)
  • Computer vision (visual perception)
  • Game-playing AI (strategic reasoning)

Examples in Production:

  • Chess engines (Deep Blue vs Kasparov, 1997): Rule-based + search, defeated world champion
  • Recommendation systems (Netflix, Spotify): Collaborative filtering (ML), drive 80% of views
  • Virtual assistants (Siri, Alexa): Hybrid AI—voice recognition (DL) + intent parsing (NLP) + command execution (rules)
  • Autonomous vehicles (Waymo, Tesla): Computer vision (DL) + path planning (rules) + decision-making (RL)

Key Insight: AI is the broadest category. Most modern AI systems combine rules, ML, and deep learning.


2. Machine Learning (ML)

Definition: Subset of AI where systems learn patterns from data without explicit programming. The algorithm discovers rules automatically.

Core Philosophy:

Traditional Programming: Rules + Data → Program → Output Machine Learning: Data + Desired Output → Algorithm → Rules (learned)

Learning Paradigms:

Paradigm Data Type How It Works Best For Example
Supervised Labeled (X, y pairs) Learn mapping from features to target Prediction, classification Spam detection, fraud detection
Unsupervised Unlabeled (X only) Find hidden patterns/structure Discovery, segmentation Customer segmentation, anomaly detection
Reinforcement Reward signals Learn policy via trial-and-error Sequential decision-making Game AI, robotics, trading
Semi-supervised Mixed labeled/unlabeled Leverage unlabeled data to improve accuracy When labeling is expensive Medical image analysis
Transfer Pre-trained model Reuse knowledge from one task on another Low-data scenarios Fine-tuning BERT for domain tasks

Production ML Use Cases:

  • Churn Prediction (SaaS): Binary classification—predict which customers will cancel. Netflix uses this to trigger retention offers.
  • Fraud Detection (Finance): Real-time anomaly detection. Detects unusual transaction patterns. Reduces fraud by 40–60%.
  • Recommendation Systems: Collaborative filtering or content-based. Amazon’s recommendations drive 35% of revenue.
  • Demand Forecasting (Retail): Time-series regression to predict inventory needs, optimize stock, minimize waste.

Key Challenge: ML requires good labeled data. Garbage in, garbage out.


3. Deep Learning (DL)

Definition: Subset of ML using multi-layered artificial neural networks to learn hierarchical representations from data. “Deep” refers to the number of layers.

Why Deep Learning Works:

Simple models can only learn simple boundaries. A network with 1 hidden layer can approximate any function, but needs exponentially more neurons. Deep networks learn hierarchies:

  • Layer 1: Low-level features (edges, textures)
  • Layer 2: Mid-level features (shapes, objects)
  • Layer 3+: High-level semantic concepts
  • Final layer: Task-specific output (classification, generation)

Example (Image Classification):

  • Input: Pixel values
  • Layer 1: Detects edges and corners
  • Layer 2: Detects textures and simple shapes
  • Layer 3: Detects object parts (eyes, ears, fur)
  • Layer 4: Detects objects (cat, dog, person)
  • Output: Classification

Common Architectures:

Architecture Domain Best For Example Systems
CNN (Convolutional) Computer vision Image classification, object detection ResNet, YOLOv8, CLIP
RNN/LSTM Sequential data Language modeling, time-series NVIDIA’s stock price predictor, early EKG anomaly detection
Transformer NLP + sequences Language understanding, generation, translation BERT, GPT-4, T5 (Google Translate)
GAN (Generative Adversarial) Image generation Synthetic data, image-to-image StyleGAN (face generation), pix2pix (sketch→photo)
Autoencoder Unsupervised learning Dimensionality reduction, anomaly detection Netflix: compress user embeddings for recommendation

Strengths: ✅ Learns hierarchical features automatically—no manual feature engineering needed ✅ State-of-the-art on complex tasks (ImageNet, COCO, SQuAD benchmarks) ✅ Scales with data—performance improves with billions of examples ✅ Transfer learning—pre-trained models accelerate new applications

Weaknesses: ❌ Needs massive labeled datasets (10K–1M+ examples) ❌ Computationally expensive (weeks on GPUs for large models) ❌ Black-box interpretability—hard to explain why model made a decision ❌ Prone to adversarial attacks and dataset bias

Production Scale Numbers:

  • GPT-4: ~175 billion parameters, trained on ~1 trillion tokens, costs ~$100M
  • ResNet-50 (ImageNet): 25.5M parameters, achieves 76% top-1 accuracy, inference: ~100ms on CPU
  • BERT (Google): 12 layers, 110M parameters, pre-trained on 3.3B words

4. Generative AI (GenAI)

Definition: Specialized area of DL focused on generating new content (text, images, code, video) based on learned patterns, rather than classifying or predicting existing data.

Discriminative vs Generative:

  • Discriminative: Learns decision boundary. “What is this image?” (Input → Classification)
  • Generative: Learns data distribution. “Create an image of a cat.” (Prompt → New Content)

Core Models:

Model Type Generates Mechanism Example Systems
LLM Human-like text Transformer + next-token prediction GPT-4, Claude, LLaMA, Gemini
Diffusion Images from noise Iterative denoising Stable Diffusion, DALL-E 3, Midjourney
GAN Images, synthetic data Generator vs discriminator adversarial loss StyleGAN, CycleGAN
Seq2Seq Any sequence output Encoder-decoder with attention Google Translate, summarization, speech synthesis

Key Capabilities (as of 2026):

  • Generate human-like text for any domain (news, creative writing, technical documentation)
  • Create realistic images from text descriptions with artistic control
  • Write production-quality code with explanation
  • Translate between 100+ languages with 90%+ accuracy
  • Summarize documents while preserving key insights
  • Answer complex questions conversationally with multi-step reasoning (Chain-of-Thought)
  • Compose music with consistent style and structure

Production Systems Using GenAI:

  • ChatGPT (OpenAI): 100M+ weekly users. Text generation. Cost: ~$0.002/1K tokens.
  • GitHub Copilot: 4M+ developers using AI for code completion. 40% of code at some companies generated by Copilot.
  • Google Search (Generative): AI Overviews generate summaries. 95% of searches still use traditional links, but GenAI is reshaping discovery.
  • Midjourney (Image Generation): $10–120/month subscription. 12M+ images generated monthly as of 2024.
  • Customer Support Bots (Enterprise): Reduce support tickets by 30–40%. Escalate complex issues to humans.

Typical GenAI Pipeline (LLM):

1
2
3
4
5
6
7
8
9
10
11
12
13
Prompt: "Write a Python function to sort a list"
   ↓
Tokenize: ["Write", "a", "Python", "function", ...]
   ↓
Embedding: Convert tokens to semantic vectors (d=4096)
   ↓
Transformer Encoder: Understand context (self-attention)
   ↓
Transformer Decoder: Generate next token (autoregressive)
   ↓
Temperature/Sampling: Control randomness (greedy, top-k, nucleus)
   ↓
Output: "def sort_list(arr):\n    return sorted(arr)"

Challenges:

  • Hallucination: Confident false statements (e.g., “Claude was founded in 2015”)
  • Latency: LLMs require 50–500ms per request at scale
  • Cost: GPT-4 is ~20x more expensive than GPT-3.5
  • Safety: Risk of generating harmful, biased, or private data

Capability Progression

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
                    Complexity & Capability
                              ▲
                              │
Complex, Creative      GenAI ├─╪ Generate text, images, code
Content Generation            │╲  Models: LLMs, Diffusion
                              │ ╲ Cost: High, Latency: High
                              │  ╲
Hierarchical Patterns  DL     ├──╪ Vision, NLP, sequences
from Complex Data             │  ╲ Models: CNN, RNN, Transformer
                              │   ╲ Needs: 10K+ examples
                              │    ╲
Structured Patterns    ML     ├────╪ Classification, prediction
from Tabular Data             │    ╲ Models: XGBoost, SVM, RF
                              │     ╲ Needs: 100–1000 examples
                              │      ╲
Simple Rules/Heuristics       ├──────╪ Deterministic logic
                              │      ╲ Cost: Low, Latency: <1ms
                              │_______╲
                              └────────┴──────────────────→ Ease of Implementation

Decision Framework

Ask yourself:

  1. Is the problem deterministic? → Use rules/ML
  2. Do you have labeled data? → Use ML
  3. Is the data high-dimensional (images, text)? → Use DL
  4. Do you need to generate new content? → Use GenAI
  5. Do you need real-time predictions? → Avoid large LLMs (use smaller models or cached results)

References

📖 Artificial Intelligence on Wikipedia 📖 Machine Learning on Wikipedia 📖 Deep Learning (Goodfellow, Bengio, Courville) — The authoritative textbook 📄 Attention Is All You Need (Vaswani et al., 2017) — Introduced Transformers, foundation of modern GenAI 🎥 3Blue1Brown: Neural Networks — Excellent visual intuition 🎥 Stanford CS224N: NLP with Deep Learning — Industry-standard NLP course

This post is licensed under CC BY 4.0 by the author.