Fine-tuning vs RAG vs Prompting
Three ways to customize LLMs: Each with different trade-offs in cost, latency, and quality.
Three ways to customize LLMs: Each with different trade-offs in cost, latency, and quality.
Semantic search at scale: Store high-dimensional embeddings, find similar documents in milliseconds.
Cloud spend is the fastest-growing and least-controlled budget line in most engineering organizations. Without FinOps discipline, cloud bills grow 30-40% year-over-year while utilization sits at 35-45%.
Grounding LLMs in knowledge: Combine document retrieval + LLM generation to answer questions with up-to-date, verifiable information.
The neural network zoo: Different architectures for different data modalities—images, sequences, and everything in between.
Engineering teams often spend 10-20% of their budget on vendors and SaaS tools with little oversight. The difference between good and bad vendor management is 20-40% on the same services.
How to control the creativity-coherence tradeoff: Sampling strategies determine whether your LLM generates repetitive text or hallucinated nonsense.
How to measure what matters: Choosing the right metric is as important as the algorithm. Wrong metric → wrong optimization → wrong outcomes.
People are 65-80% of your engineering budget. If you can't model headcount costs accurately -- fully loaded, with attrition, ramp time, and hiring velocity -- your entire budget is fiction.
Taxonomy of essential machine learning algorithms—covering the algorithms that power production systems across industry.