Platform Engineering Foundations

An industry primer on platform engineering, covering organisational vocabulary (Team Topologies, Platform-as-a-Product, paved roads, SRE) and the technical building blocks of a modern Internal Developer Platform.

Posted Apr 29, 2026

12 min read

An industry primer on platform engineering, covering both the organisational vocabulary (Team Topologies, Platform-as-a-Product, paved roads, “you build it, you run it”, SRE) and the technical substance — the building blocks of a modern Internal Developer Platform (IDP) and case studies of platforms other organisations have published about. This is the entry point to the folder; deeper docs cover each concept in detail.

Why platform engineering exists

The platform-engineering discipline has, over the last decade, converged on two distinct bodies of knowledge:

Organisational patterns for how platform and delivery teams work together — Team Topologies, Platform-as-a-Product, paved roads, SRE.
Technical building blocks that every modern Internal Developer Platform is assembled from — developer portals, scaffolding, workload orchestration, GitOps delivery, runtime, policy & supply chain — and a growing body of public case studies from Spotify, Netflix, Zalando, Monzo, Mercedes-Benz, Adevinta, and others.

The convergence matters: a team starting today no longer has to invent the discipline from scratch. The vocabulary is shared, the layers of an IDP are well-understood, and the products at each layer form a choose-what-fits menu.

graph TD
    A[Platform Engineering] --> B[Organisational Patterns]
    A --> C[Technical Building Blocks]

    B --> B1[Team Topologies]
    B --> B2[Platform-as-a-Product]
    B --> B3[Paved Road / Golden Path]
    B --> B4[You Build It You Run It]
    B --> B5[SRE — SLOs & Error Budgets]

    C --> C1[Developer Portal]
    C --> C2[Scaffolding & Templates]
    C --> C3[Workload Orchestration]
    C --> C4[GitOps Delivery]
    C --> C5[Runtime Plane]
    C --> C6[Policy & Supply Chain]

Organisational vocabulary in brief

A short orientation. Links point to canonical sources for depth; full deep-dives live in Team Topologies for Platform Teams and Platform as a Product.

Team Topologies

Team Topologies (Matthew Skelton & Manuel Pais, 2019; 2nd ed. 2025) is the standard framework for designing engineering orgs around flow. It defines:

Four team types

Stream-aligned — the default. Owns a business-domain flow end-to-end.
Platform — internal product team. Builds a self-service platform consumed by stream-aligned teams.
Enabling — short-lived coaches. Help stream-aligned teams adopt new capabilities.
Complicated-subsystem — deep specialists for hairy problems (e.g. video codecs, ML inference).

Three interaction modes

X-as-a-Service — self-serve, steady state. The target for any mature capability.
Collaboration — time-boxed joint discovery. Two teams build something together.
Facilitating — temporary pairing during onboarding or new-tech rollout.

The trajectory for any capability is Collaboration → Facilitating → X-as-a-Service. A platform team perpetually stuck in Collaboration is doing consulting, not platform engineering.

→ Martin Fowler bliki: TeamTopologies.

Adjacent operating models

Each of these extends or overlaps with Team Topologies. Most mature orgs blend them.

Platform-as-a-Product — treat the platform as a product with customers (delivery teams), a PM, a roadmap, user research, adoption metrics. Measured by adoption, not tickets resolved. Convergent consensus across DORA, CNCF, Gartner, Thoughtworks. → Manuel Pais, Platform as a Product; Evan Bottcher, What I Talk About When I Talk About Platforms.
Paved road / golden path — the output of Platform-as-a-Product: the opinionated, well-lit path from git push to production. Netflix: “paved road”; Spotify: “golden path”. Teams inherit security, observability, reliability defaults for free.
“You build it, you run it” — Werner Vogels (Amazon). Delivery teams own services end-to-end including on-call, on top of the paved road — not instead of it.
Site Reliability Engineering (SRE) — Google’s discipline: reliability governed by SLOs and error budgets. Mature platforms adopt the practices even without a dedicated SRE team. → sre.google/books.
Cloud Center of Excellence (CCoE) — cross-cutting group for cloud standards, landing zones, FinOps. Folds into the platform team at smaller scale.
Traditional centralised Ops — separate Ops department receiving deployment hand-offs. The model DevOps, SRE, and Team Topologies were designed to replace. Reference point only.

The mature configuration

Most organisations settle on a blend:

Platform-as-a-Product + X-as-a-Service as the default interaction + “You build it, you run it” on top + SRE practices (SLOs, error budgets).

For a comparison of working-model alternatives and their tradeoffs, see the DevOps Topologies anti-types catalogue and the CNCF Platform Engineering Maturity Model.

What a platform actually is — the IDP building blocks

The vocabulary above describes how a platform team works. This section describes what a platform is made of. A modern Internal Developer Platform (IDP) is assembled from a recognisable set of building blocks that have converged over the last five years. Not every platform has all of them, and teams mix build / buy / open-source freely — but the shape of the stack is now broadly industry-standard.

For a deeper treatment of each layer, see Internal Developer Platforms.

The IDP reference stack (six layers)

The layers are conceptual, not organisational. One team typically owns the whole stack, but thinking in layers clarifies what the platform does, which pieces are replaceable, and where the tenant-facing contract lives.

graph TD
    Dev[Developer] -->|browses| L1[1. Developer Portal & Catalogue]
    Dev -->|creates new| L2[2. Scaffolding & Golden Paths]
    Dev -->|declares intent| L3[3. Workload Spec & Orchestration]
    L3 -->|generates manifests| L4[4. CD & GitOps]
    L4 -->|deploys to| L5[5. Runtime Plane]
    L6[6. Policy, Supply Chain, Observability] -.guards.-> L4
    L6 -.guards.-> L5

Layer	What it does	Representative products
1. Developer portal & service catalogue	Single place for developers to discover services, owners, docs, SLOs, dependencies, runbooks. The “front door” of the platform.	Backstage (Spotify OSS, CNCF incubating), Port, Cortex, OpsLevel, Roadie (hosted Backstage)
2. Scaffolding & golden-path templates	Creates new services / repos / pipelines with sensible defaults baked in — the paved road, expressed in code.	Backstage Scaffolder, Cookiecutter, Copier, Projen, internal generators
3. Workload specification & orchestration	The tenant-facing declarative interface: “I need a service with these resources” → platform resolves the declaration into concrete cloud + Kubernetes state.	Score (CNCF open spec), Humanitec (platform orchestrator), Kratix, Crossplane, Radius (Microsoft), KusionStack, CUE-based orchestrators
4. Continuous delivery & GitOps	Ships code and config from `git push` to running workloads; keeps cluster state reconciled.	Flux (CNCF graduated), Argo CD (CNCF graduated), Spinnaker (Netflix OSS), Tekton, reusable GitHub Actions workflows
5. Runtime & supporting plane	The compute and day-2 concerns: clusters, networking, ingress, secrets, autoscaling, certs.	GKE / EKS / AKS; Istio / Linkerd; KEDA; External Secrets Operator; cert-manager
6. Policy, supply chain, observability	The non-negotiable floor: admission policy, image signing, SBOMs, metrics / logs / traces.	Kyverno, OPA / Gatekeeper, Kubewarden; Sigstore / cosign, SLSA, in-toto; Prometheus / VictoriaMetrics; Grafana; OpenTelemetry

Two architectural patterns that shape the stack

Two cross-cutting patterns define most modern platforms.

Platform orchestrator pattern. A tenant declares intent (workloads + required resources) in a single config file; the platform translates that declaration into the underlying IaC modules and Kubernetes manifests; CI pipelines sync the generated artefacts to the cloud. Humanitec popularised the term and ships a commercial implementation, but the pattern is increasingly built in-house. The orchestrator is where the platform’s opinionated decisions live: which resources tenants can self-service, what defaults apply, what validation runs before provisioning.

Workload specification as a standard. A deliberately minimal, tool-agnostic spec for what a workload is (image, ports, resource dependencies) that multiple orchestrators can consume. Score is the emerging open standard — explicitly designed to be implemented by Humanitec, Kubernetes-native tools, Nomad, etc. Adopting a spec standard (even an in-house one) lets tenants stay portable across platform rewrites.

sequenceDiagram
    participant T as Tenant
    participant S as Spec (Score / CUE / YAML)
    participant O as Orchestrator
    participant G as GitOps Engine
    participant R as Runtime

    T->>S: declares workload + resources
    S->>O: feeds spec
    O->>O: resolves defaults, validates, picks IaC modules
    O->>G: emits Terraform JSON + K8s manifests
    G->>R: reconciles desired state
    R-->>T: workload running with paved-road defaults

Case studies — what other organisations have built

Short snapshots of platforms at companies that have published enough about their work to learn from.

Spotify — Backstage

A unified developer portal that catalogues every service, owner, doc, pipeline, and SLO across Spotify engineering. Open-sourced in 2020; now a CNCF incubating project with wide adoption (Netflix, American Airlines, Expedia, HP, LinkedIn, Mercedes-Benz, Siemens). Established the “developer portal” as its own layer — not an accessory to CI/CD — and showed that a catalogue-first approach scales across hundreds of services and thousands of engineers.

→ Pia Nilsson, No More Manuals: The Secret to Spotify’s Speedy Microservices Development; Stefan Ålund, What the Heck Is Backstage Anyway?; backstage.io — adopters. Detailed treatment in Backstage and Developer Portals.

Netflix — Paved Road and full-cycle developers

A deep internal platform (Spinnaker for delivery, Titus for containers, Atlas for metrics, plus a large library of paved-road services) that Netflix engineers use by default. “Full-cycle developers” own their services end-to-end — on top of the paved road. The canonical statement of “you build it, you run it” on top of a platform, not instead of one.

→ Greg Burrell, Full Cycle Developers at Netflix; Dianne Marsh, How We Build Code at Netflix; Netflix Tech Blog. Detailed treatment in Golden Paths and Paved Roads.

Zalando — Radical Agility → shared Kubernetes platform

Started with “Radical Agility” (radically autonomous teams, cluster-per-team) and evolved toward a shared Kubernetes platform as the cost of unconstrained autonomy became clear. A rich case study of the arc every scaling platform team eventually traverses — too much autonomy without a paved road, then a migration toward shared platform capabilities without sacrificing team ownership.

→ Henning Jacobs, A Year of Kubernetes at Zalando; Running Kubernetes in Production; engineering.zalando.com.

Monzo — platform engineering at scale

~2,000-microservice platform on Kubernetes + Linkerd + an internal RPC abstraction. Heavy use of code generation, linting, and a strong service-template approach to keep the service topology legible with a small platform team. Demonstrates how a disciplined abstraction layer (RPC + templates) lets a small platform team support a very large service count without collapsing under coordination overhead.

→ Suhail Patel, The Trillion Dollar Payment Network; Scaling Monzo’s Infrastructure with Istio and Envoy; monzo.com/blog/technology.

Mercedes-Benz — Backstage-based IDP

An internal developer platform for thousands of engineers across automotive software, built around Backstage with Kubernetes, GitHub, and custom plugins. Runs in-car / MB.OS software adjacent to cloud services. A European, non-web-native example — platform engineering applied at a traditional enterprise scale.

→ CNCF case study: Mercedes-Benz; Mercedes-Benz Tech Innovation on Medium.

Adevinta — Common Platform

Common Platform serving 50+ marketplace product teams across Europe, built on Kubernetes with a strong paved-road philosophy and explicit platform SLOs. A European marketplace case with public writing on platform evolution, SLOs, and cost as a product dimension.

→ The Journey to Building the Adevinta Common Platform; Adevinta Tech Blog; platformengineering.org talks library.

Humanitec — GCP reference architecture

A vendor-neutral reference architecture for building an IDP on GCP, published by the team behind the “platform orchestrator” pattern. The closest thing to a “textbook” answer for an opinionated GCP IDP — useful as an external check on architectural choices.

→ Humanitec, GCP Platform Reference Architecture; Kaspar von Grünberg, What Is a Platform Orchestrator?; humanitec.com/blog.

Common threads across all of them

Catalogue-first. Every successful large-scale platform puts a service catalogue at the front of the funnel. Backstage dominates the OSS choice; Port / Cortex / OpsLevel lead the commercial space.
One declarative interface for tenants. Whether it is Backstage Software Templates, Humanitec manifests, Score specs, or in-house CUE files — the tenant declares intent once; the platform resolves it into concrete infrastructure.
GitOps for cluster and infra state. Every modern platform is GitOps-driven or converging toward it. Flux and Argo CD are the two defaults.
Policy and supply chain enforced at the cluster boundary, not at code review. Kyverno / OPA / cosign / SLSA enforce what humans used to eyeball in PRs.
Platform SLOs. Every mature team publishes reliability contracts for the platform itself — pipeline success rate, deploy time, environment provisioning time, portal availability.

The specific choices vary; the pattern is the same.

When this discipline applies

✅ Use platform engineering when:

You have multiple stream-aligned teams (typically 5+) shipping software in parallel.
Cognitive load of cloud + Kubernetes + observability + supply chain is slowing teams down.
You see inconsistent reliability, security, and cost outcomes across teams.
You want standardisation without central bottlenecks — a paved road they can opt into, not a gate they must pass through.

❌ Don’t invest in heavy platform engineering when:

You have a single product team and direct tooling already works.
Your bottleneck is product-market fit, not delivery speed.
You don’t yet have the engineering volume to justify a platform team (roughly: 30+ engineers, 10+ services).
You’d be building before discovering — invest in 2-3 stream-aligned teams first to learn what they actually need paved.

References

Books

Matthew Skelton & Manuel Pais, Team Topologies: Organizing Business and Technology Teams for Fast Flow, 2nd ed., IT Revolution, 2025. → itrevolution.com
Camille Fournier & Ian Nowland, Platform Engineering: A Guide for Technical, Product, and People Leaders, O’Reilly, 2024.
Betsy Beyer et al., Site Reliability Engineering & The SRE Workbook, Google / O’Reilly. → sre.google/books
Nicole Forsgren, Jez Humble, Gene Kim, Accelerate, IT Revolution, 2018.

Articles and talks

Evan Bottcher, What I Talk About When I Talk About Platforms — martinfowler.com/articles/talk-about-platforms.html
Manuel Pais, Mind the Platform Execution Gap — martinfowler.com/articles/platform-prerequisites.html
Manuel Pais, Platform as a Product — platformengineering.org/talks-library
Martin Fowler, Team Topologies (bliki) — martinfowler.com/bliki/TeamTopologies.html

Frameworks and reference architectures

CNCF Platforms White Paper — tag-app-delivery.cncf.io/whitepapers/platforms
CNCF Platform Engineering Maturity Model — tag-app-delivery.cncf.io/whitepapers/platform-eng-maturity-model
DORA Platform Engineering capability — dora.dev/capabilities/platform-engineering
Humanitec, GCP Platform Reference Architecture — humanitec.com/reference-architectures/gcp
Score — open workload specification — score.dev
DevOps Topologies — Anti-Types — web.devopstopologies.com/#anti-types

Reports

DORA 2024 State of DevOps Report — PDF
Puppet, State of Platform Engineering Vol. 4 — platformengineering.org/blog
Thoughtworks Technology Radar — thoughtworks.com/radar

Engineering blogs (case studies)

Spotify — engineering.atspotify.com · backstage.io
Netflix — netflixtechblog.com
Zalando — engineering.zalando.com · srcco.de (Henning Jacobs)
Monzo — monzo.com/blog/technology
Mercedes-Benz — Tech Innovation on Medium · CNCF case study
Adevinta — adevinta.com/techblog
Humanitec — humanitec.com/blog

Software Architecture, Platform Engineering

system-design team-building

This post is licensed under CC BY 4.0 by the author.