Platform Engineering Foundations
An industry primer on platform engineering, covering organisational vocabulary (Team Topologies, Platform-as-a-Product, paved roads, SRE) and the technical building blocks of a modern Internal Developer Platform.
An industry primer on platform engineering, covering both the organisational vocabulary (Team Topologies, Platform-as-a-Product, paved roads, “you build it, you run it”, SRE) and the technical substance — the building blocks of a modern Internal Developer Platform (IDP) and case studies of platforms other organisations have published about. This is the entry point to the folder; deeper docs cover each concept in detail.
Why platform engineering exists
The platform-engineering discipline has, over the last decade, converged on two distinct bodies of knowledge:
- Organisational patterns for how platform and delivery teams work together — Team Topologies, Platform-as-a-Product, paved roads, SRE.
- Technical building blocks that every modern Internal Developer Platform is assembled from — developer portals, scaffolding, workload orchestration, GitOps delivery, runtime, policy & supply chain — and a growing body of public case studies from Spotify, Netflix, Zalando, Monzo, Mercedes-Benz, Adevinta, and others.
The convergence matters: a team starting today no longer has to invent the discipline from scratch. The vocabulary is shared, the layers of an IDP are well-understood, and the products at each layer form a choose-what-fits menu.
graph TD
A[Platform Engineering] --> B[Organisational Patterns]
A --> C[Technical Building Blocks]
B --> B1[Team Topologies]
B --> B2[Platform-as-a-Product]
B --> B3[Paved Road / Golden Path]
B --> B4[You Build It You Run It]
B --> B5[SRE — SLOs & Error Budgets]
C --> C1[Developer Portal]
C --> C2[Scaffolding & Templates]
C --> C3[Workload Orchestration]
C --> C4[GitOps Delivery]
C --> C5[Runtime Plane]
C --> C6[Policy & Supply Chain]
Organisational vocabulary in brief
A short orientation. Links point to canonical sources for depth; full deep-dives live in Team Topologies for Platform Teams and Platform as a Product.
Team Topologies
Team Topologies (Matthew Skelton & Manuel Pais, 2019; 2nd ed. 2025) is the standard framework for designing engineering orgs around flow. It defines:
Four team types
- Stream-aligned — the default. Owns a business-domain flow end-to-end.
- Platform — internal product team. Builds a self-service platform consumed by stream-aligned teams.
- Enabling — short-lived coaches. Help stream-aligned teams adopt new capabilities.
- Complicated-subsystem — deep specialists for hairy problems (e.g. video codecs, ML inference).
Three interaction modes
- X-as-a-Service — self-serve, steady state. The target for any mature capability.
- Collaboration — time-boxed joint discovery. Two teams build something together.
- Facilitating — temporary pairing during onboarding or new-tech rollout.
The trajectory for any capability is Collaboration → Facilitating → X-as-a-Service. A platform team perpetually stuck in Collaboration is doing consulting, not platform engineering.
→ Martin Fowler bliki: TeamTopologies.
Adjacent operating models
Each of these extends or overlaps with Team Topologies. Most mature orgs blend them.
- Platform-as-a-Product — treat the platform as a product with customers (delivery teams), a PM, a roadmap, user research, adoption metrics. Measured by adoption, not tickets resolved. Convergent consensus across DORA, CNCF, Gartner, Thoughtworks. → Manuel Pais, Platform as a Product; Evan Bottcher, What I Talk About When I Talk About Platforms.
- Paved road / golden path — the output of Platform-as-a-Product: the opinionated, well-lit path from
git pushto production. Netflix: “paved road”; Spotify: “golden path”. Teams inherit security, observability, reliability defaults for free. - “You build it, you run it” — Werner Vogels (Amazon). Delivery teams own services end-to-end including on-call, on top of the paved road — not instead of it.
- Site Reliability Engineering (SRE) — Google’s discipline: reliability governed by SLOs and error budgets. Mature platforms adopt the practices even without a dedicated SRE team. → sre.google/books.
- Cloud Center of Excellence (CCoE) — cross-cutting group for cloud standards, landing zones, FinOps. Folds into the platform team at smaller scale.
- Traditional centralised Ops — separate Ops department receiving deployment hand-offs. The model DevOps, SRE, and Team Topologies were designed to replace. Reference point only.
The mature configuration
Most organisations settle on a blend:
Platform-as-a-Product + X-as-a-Service as the default interaction + “You build it, you run it” on top + SRE practices (SLOs, error budgets).
For a comparison of working-model alternatives and their tradeoffs, see the DevOps Topologies anti-types catalogue and the CNCF Platform Engineering Maturity Model.
What a platform actually is — the IDP building blocks
The vocabulary above describes how a platform team works. This section describes what a platform is made of. A modern Internal Developer Platform (IDP) is assembled from a recognisable set of building blocks that have converged over the last five years. Not every platform has all of them, and teams mix build / buy / open-source freely — but the shape of the stack is now broadly industry-standard.
For a deeper treatment of each layer, see Internal Developer Platforms.
The IDP reference stack (six layers)
The layers are conceptual, not organisational. One team typically owns the whole stack, but thinking in layers clarifies what the platform does, which pieces are replaceable, and where the tenant-facing contract lives.
graph TD
Dev[Developer] -->|browses| L1[1. Developer Portal & Catalogue]
Dev -->|creates new| L2[2. Scaffolding & Golden Paths]
Dev -->|declares intent| L3[3. Workload Spec & Orchestration]
L3 -->|generates manifests| L4[4. CD & GitOps]
L4 -->|deploys to| L5[5. Runtime Plane]
L6[6. Policy, Supply Chain, Observability] -.guards.-> L4
L6 -.guards.-> L5
| Layer | What it does | Representative products |
|---|---|---|
| 1. Developer portal & service catalogue | Single place for developers to discover services, owners, docs, SLOs, dependencies, runbooks. The “front door” of the platform. | Backstage (Spotify OSS, CNCF incubating), Port, Cortex, OpsLevel, Roadie (hosted Backstage) |
| 2. Scaffolding & golden-path templates | Creates new services / repos / pipelines with sensible defaults baked in — the paved road, expressed in code. | Backstage Scaffolder, Cookiecutter, Copier, Projen, internal generators |
| 3. Workload specification & orchestration | The tenant-facing declarative interface: “I need a service with these resources” → platform resolves the declaration into concrete cloud + Kubernetes state. | Score (CNCF open spec), Humanitec (platform orchestrator), Kratix, Crossplane, Radius (Microsoft), KusionStack, CUE-based orchestrators |
| 4. Continuous delivery & GitOps | Ships code and config from git push to running workloads; keeps cluster state reconciled. |
Flux (CNCF graduated), Argo CD (CNCF graduated), Spinnaker (Netflix OSS), Tekton, reusable GitHub Actions workflows |
| 5. Runtime & supporting plane | The compute and day-2 concerns: clusters, networking, ingress, secrets, autoscaling, certs. | GKE / EKS / AKS; Istio / Linkerd; KEDA; External Secrets Operator; cert-manager |
| 6. Policy, supply chain, observability | The non-negotiable floor: admission policy, image signing, SBOMs, metrics / logs / traces. | Kyverno, OPA / Gatekeeper, Kubewarden; Sigstore / cosign, SLSA, in-toto; Prometheus / VictoriaMetrics; Grafana; OpenTelemetry |
Two architectural patterns that shape the stack
Two cross-cutting patterns define most modern platforms.
Platform orchestrator pattern. A tenant declares intent (workloads + required resources) in a single config file; the platform translates that declaration into the underlying IaC modules and Kubernetes manifests; CI pipelines sync the generated artefacts to the cloud. Humanitec popularised the term and ships a commercial implementation, but the pattern is increasingly built in-house. The orchestrator is where the platform’s opinionated decisions live: which resources tenants can self-service, what defaults apply, what validation runs before provisioning.
Workload specification as a standard. A deliberately minimal, tool-agnostic spec for what a workload is (image, ports, resource dependencies) that multiple orchestrators can consume. Score is the emerging open standard — explicitly designed to be implemented by Humanitec, Kubernetes-native tools, Nomad, etc. Adopting a spec standard (even an in-house one) lets tenants stay portable across platform rewrites.
sequenceDiagram
participant T as Tenant
participant S as Spec (Score / CUE / YAML)
participant O as Orchestrator
participant G as GitOps Engine
participant R as Runtime
T->>S: declares workload + resources
S->>O: feeds spec
O->>O: resolves defaults, validates, picks IaC modules
O->>G: emits Terraform JSON + K8s manifests
G->>R: reconciles desired state
R-->>T: workload running with paved-road defaults
Case studies — what other organisations have built
Short snapshots of platforms at companies that have published enough about their work to learn from.
Spotify — Backstage
A unified developer portal that catalogues every service, owner, doc, pipeline, and SLO across Spotify engineering. Open-sourced in 2020; now a CNCF incubating project with wide adoption (Netflix, American Airlines, Expedia, HP, LinkedIn, Mercedes-Benz, Siemens). Established the “developer portal” as its own layer — not an accessory to CI/CD — and showed that a catalogue-first approach scales across hundreds of services and thousands of engineers.
→ Pia Nilsson, No More Manuals: The Secret to Spotify’s Speedy Microservices Development; Stefan Ålund, What the Heck Is Backstage Anyway?; backstage.io — adopters. Detailed treatment in Backstage and Developer Portals.
Netflix — Paved Road and full-cycle developers
A deep internal platform (Spinnaker for delivery, Titus for containers, Atlas for metrics, plus a large library of paved-road services) that Netflix engineers use by default. “Full-cycle developers” own their services end-to-end — on top of the paved road. The canonical statement of “you build it, you run it” on top of a platform, not instead of one.
→ Greg Burrell, Full Cycle Developers at Netflix; Dianne Marsh, How We Build Code at Netflix; Netflix Tech Blog. Detailed treatment in Golden Paths and Paved Roads.
Zalando — Radical Agility → shared Kubernetes platform
Started with “Radical Agility” (radically autonomous teams, cluster-per-team) and evolved toward a shared Kubernetes platform as the cost of unconstrained autonomy became clear. A rich case study of the arc every scaling platform team eventually traverses — too much autonomy without a paved road, then a migration toward shared platform capabilities without sacrificing team ownership.
→ Henning Jacobs, A Year of Kubernetes at Zalando; Running Kubernetes in Production; engineering.zalando.com.
Monzo — platform engineering at scale
~2,000-microservice platform on Kubernetes + Linkerd + an internal RPC abstraction. Heavy use of code generation, linting, and a strong service-template approach to keep the service topology legible with a small platform team. Demonstrates how a disciplined abstraction layer (RPC + templates) lets a small platform team support a very large service count without collapsing under coordination overhead.
→ Suhail Patel, The Trillion Dollar Payment Network; Scaling Monzo’s Infrastructure with Istio and Envoy; monzo.com/blog/technology.
Mercedes-Benz — Backstage-based IDP
An internal developer platform for thousands of engineers across automotive software, built around Backstage with Kubernetes, GitHub, and custom plugins. Runs in-car / MB.OS software adjacent to cloud services. A European, non-web-native example — platform engineering applied at a traditional enterprise scale.
→ CNCF case study: Mercedes-Benz; Mercedes-Benz Tech Innovation on Medium.
Adevinta — Common Platform
Common Platform serving 50+ marketplace product teams across Europe, built on Kubernetes with a strong paved-road philosophy and explicit platform SLOs. A European marketplace case with public writing on platform evolution, SLOs, and cost as a product dimension.
→ The Journey to Building the Adevinta Common Platform; Adevinta Tech Blog; platformengineering.org talks library.
Humanitec — GCP reference architecture
A vendor-neutral reference architecture for building an IDP on GCP, published by the team behind the “platform orchestrator” pattern. The closest thing to a “textbook” answer for an opinionated GCP IDP — useful as an external check on architectural choices.
→ Humanitec, GCP Platform Reference Architecture; Kaspar von Grünberg, What Is a Platform Orchestrator?; humanitec.com/blog.
Common threads across all of them
- Catalogue-first. Every successful large-scale platform puts a service catalogue at the front of the funnel. Backstage dominates the OSS choice; Port / Cortex / OpsLevel lead the commercial space.
- One declarative interface for tenants. Whether it is Backstage Software Templates, Humanitec manifests, Score specs, or in-house CUE files — the tenant declares intent once; the platform resolves it into concrete infrastructure.
- GitOps for cluster and infra state. Every modern platform is GitOps-driven or converging toward it. Flux and Argo CD are the two defaults.
- Policy and supply chain enforced at the cluster boundary, not at code review. Kyverno / OPA / cosign / SLSA enforce what humans used to eyeball in PRs.
- Platform SLOs. Every mature team publishes reliability contracts for the platform itself — pipeline success rate, deploy time, environment provisioning time, portal availability.
The specific choices vary; the pattern is the same.
When this discipline applies
✅ Use platform engineering when:
- You have multiple stream-aligned teams (typically 5+) shipping software in parallel.
- Cognitive load of cloud + Kubernetes + observability + supply chain is slowing teams down.
- You see inconsistent reliability, security, and cost outcomes across teams.
- You want standardisation without central bottlenecks — a paved road they can opt into, not a gate they must pass through.
❌ Don’t invest in heavy platform engineering when:
- You have a single product team and direct tooling already works.
- Your bottleneck is product-market fit, not delivery speed.
- You don’t yet have the engineering volume to justify a platform team (roughly: 30+ engineers, 10+ services).
- You’d be building before discovering — invest in 2-3 stream-aligned teams first to learn what they actually need paved.
References
Books
- Matthew Skelton & Manuel Pais, Team Topologies: Organizing Business and Technology Teams for Fast Flow, 2nd ed., IT Revolution, 2025. → itrevolution.com
- Camille Fournier & Ian Nowland, Platform Engineering: A Guide for Technical, Product, and People Leaders, O’Reilly, 2024.
- Betsy Beyer et al., Site Reliability Engineering & The SRE Workbook, Google / O’Reilly. → sre.google/books
- Nicole Forsgren, Jez Humble, Gene Kim, Accelerate, IT Revolution, 2018.
Articles and talks
- Evan Bottcher, What I Talk About When I Talk About Platforms — martinfowler.com/articles/talk-about-platforms.html
- Manuel Pais, Mind the Platform Execution Gap — martinfowler.com/articles/platform-prerequisites.html
- Manuel Pais, Platform as a Product — platformengineering.org/talks-library
- Martin Fowler, Team Topologies (bliki) — martinfowler.com/bliki/TeamTopologies.html
Frameworks and reference architectures
- CNCF Platforms White Paper — tag-app-delivery.cncf.io/whitepapers/platforms
- CNCF Platform Engineering Maturity Model — tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model
- DORA Platform Engineering capability — dora.dev/capabilities/platform-engineering
- Humanitec, GCP Platform Reference Architecture — humanitec.com/reference-architectures/gcp
- Score — open workload specification — score.dev
- DevOps Topologies — Anti-Types — web.devopstopologies.com/#anti-types
Reports
- DORA 2024 State of DevOps Report — PDF
- Puppet, State of Platform Engineering Vol. 4 — platformengineering.org/blog
- Thoughtworks Technology Radar — thoughtworks.com/radar
Engineering blogs (case studies)
- Spotify — engineering.atspotify.com · backstage.io
- Netflix — netflixtechblog.com
- Zalando — engineering.zalando.com · srcco.de (Henning Jacobs)
- Monzo — monzo.com/blog/technology
- Mercedes-Benz — Tech Innovation on Medium · CNCF case study
- Adevinta — adevinta.com/techblog
- Humanitec — humanitec.com/blog