Post

Golden Paths and Paved Roads

A golden path is not a mandate — it is the best available answer to a question most teams will eventually ask, shipped before they get lost. The difference between a paved road and a mandate is who bears the cost.

Golden Paths and Paved Roads

A golden path is not a mandate — it is the best available answer to a question most teams will eventually ask, shipped before they get lost. The difference between a paved road and a mandate is who bears the cost: on a paved road, teams that go off-road are responsible for everything they leave behind.


Core Properties

Property Value
Origin Spotify “golden path” (2014); Netflix “paved road” (2016–2017); Google “blessed path” (SRE context)
Defining characteristic Opt-in, opinionated, fully-supported default — not a gate, not a mandate
Primary mechanism Scaffolding templates + reusable pipeline components + default-rich library integrations
What it solves Cognitive load sprawl, inconsistent reliability/security baselines, “rumour-driven development”
Failure mode if done wrong Template that is a gate not a default; template that paves hypothetical needs; template the platform team doesn’t use
Primary tooling Backstage Scaffolder, Projen, Copier, GitHub Actions composite actions, ArgoCD ApplicationSets
Measurement Golden-path adoption %, time-to-first-deploy for new services, % on current template version, off-road support requests

When to Use / Avoid

Use When

  • You have 5+ stream-aligned teams independently bootstrapping similar services and repeating the same setup decisions (Dockerfile, CI pipeline, observability wiring, secrets config).
  • A new developer joining a team cannot answer “how do I start a new service here?” without asking three people — the golden path is the answer to that question in a runnable artefact.
  • You see inconsistent security or reliability outcomes across services: some have structured logs, some have none; some have OWASP headers, some don’t.
  • Your SRE team struggles to write generic runbooks because every team’s service looks different — standardising observability and SLO shape makes the paved road an SRE force multiplier.
  • You want to enforce non-negotiables (SBOM generation, image signing, OTEL instrumentation) without a review process that blocks deployments.

Avoid When

  • You have fewer than 3 active delivery teams — a golden path has ongoing maintenance cost; don’t build one until you have enough users to justify it.
  • You are in early product discovery: the right service shape is still unknown, and paving the wrong path calcifies bad decisions.
  • Organisational trust is low — if the platform team hasn’t built credibility, a golden path will be ignored no matter how well it is built. Earn trust with small wins first (see Platform as a Product).
  • The only reason for the path is documentation rather than automation — if the “golden path” is a Confluence page describing what to set up manually, it is not a path, it is instructions.

Terminology — One Concept, Three Brands

The industry coined three names for the same idea, each reflecting the culture of the organisation that coined it.

graph LR
    A[Same Core Idea] --> B[Spotify: Golden Path]
    A --> C[Netflix: Paved Road]
    A --> D[Google: Blessed Path]
    B --> E[Opt-in opinionated defaults\nwith tutorials and tooling]
    C --> E
    D --> E

Spotify “golden path” — coined internally around 2014, published formally in the August 2020 engineering blog post by Pia Nilsson. Spotify’s fragmentation problem: 300+ engineering teams, each making their own tool choices, led to what Nilsson called “rumour-driven development” — the only way to learn how to do something was to ask a colleague. The golden path was the opinionated, supported answer. Spotify now has paths for backend engineering, client development, data engineering, data science, machine learning, web development, and audio processing.

Netflix “paved road” — coined in the context of Netflix’s “full-cycle developer” model and articulated publicly by Yunong Xiao at QCon New York 2017 and by Greg Burrell in the 2018 Full Cycle Developers at Netflix blog post. Netflix’s framing is explicitly about the opt-in contract: the paved road is “the sensibly configured but customisable platform” — teams that stay on it receive automatic support; teams that go off-road own the consequences entirely.

Google “blessed path” — used in Google’s SRE and internal tooling context to describe the recommended toolchain. Less prominent in external writing than Spotify’s or Netflix’s branding, but the 2024 Google Cloud blog post Golden Paths for Engineering Execution Consistency uses all three terms interchangeably, confirming industry convergence.

The underlying mechanism is identical across all three: an opinionated template that encodes defaults, wires in non-negotiable capabilities (observability, security, CI/CD), and is actively maintained by the platform team. The team can leave; the path stays.

Source: How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem (Spotify Engineering, 2020) — defines the concept, traces back to 2014 origins, explains the fragmentation problem it solves.


Why Golden Paths Exist — The Cognitive Load Problem

Every engineering org above a certain size faces the same failure mode: team autonomy, pursued without constraint, produces a combinatorial explosion of tool choices. Each choice is rational in isolation; the aggregate is catastrophic.

The teams end up in what Charity Majors described in her 2018 essay as “software sprawl” — dozens of databases, message brokers, and deployment tools, each with its own operational burden, each requiring specialist knowledge, none of which is sharable across the org. On-call rotations become impossible. Observability is incoherent. New team members spend weeks or months in environment setup before writing a line of business logic.

The traditional answer was centralised standards bodies — a committee that approved technology choices. This trades autonomy for consistency but creates a bottleneck and resentment. The platform engineering answer is different: reduce the cognitive cost of the good choice until teams make it by default, without a mandate.

The key distinction between a paved road and a mandate:

flowchart TD
    Dev[Developer starts new service] --> Q{Use paved road?}
    Q -->|Yes| PR[Bootstrap in minutes\nCI/CD, observability, security wired in\nPlatform team supports you]
    Q -->|No| NP[Bootstrap manually\nYou own the whole stack\nPlatform team helps on best-effort basis]
    PR --> Prod[Ship to production]
    NP --> Prod

The cost asymmetry does the work: the paved road is faster, better-instrumented, and supported. Going off-road is not blocked — but every team that does so becomes responsible for everything the path would have provided. In a mature org, off-road usage signals either a genuine unmet need (which feeds the path roadmap) or a team that needs coaching on total cost of ownership.

This is categorically different from mandated standardisation. A mandate forbids the alternative; a paved road prices the alternative. The former creates compliance culture and workarounds; the latter creates informed choices with owned consequences.

Source: Software Sprawl, The Golden Path, and Scaling Teams With Agency (Charity Majors, 2018) — articulates the cognitive load framing and the five-step process for building an initial golden path.


The Spotify Golden Path Model

Spotify’s golden path model has three components: the path definition (what gets paved), the tutorial layer (how teams follow it), and the scaffolding implementation (how new services start on it automatically).

Picking what to pave. Spotify uses an engineering productivity squad whose mandate is measuring and reducing friction. The squad surveys teams, analyses support requests, and identifies patterns — what are teams repeatedly setting up from scratch? What decisions are being made inconsistently? The highest-friction, highest-frequency decisions get paved first. Spotify’s first path was backend engineering (the most common type of new service). Paths for data engineering, data science, and ML followed as those disciplines scaled.

The tutorial layer. Every golden path at Spotify ships with a step-by-step tutorial that walks a developer from zero to a running service. This is what Pia Nilsson described as eliminating “the only way to find out is to ask your colleague” — the tutorial is the institutionalised answer. The key quality bar: a developer who has never worked at Spotify should be able to follow the tutorial and have a service deployed. If they cannot, the tutorial is not done.

Backstage Scaffolder implementation. Since 2020, Spotify’s golden paths are implemented as Backstage Software Templates — YAML manifest files that define what input the developer provides (service name, team, language), what actions run (create GitHub repo, apply template files, register with service catalogue, trigger first pipeline run), and what the resulting project looks like. A developer who clicks “Create” in the Backstage portal gets a repository with a Dockerfile, a CI pipeline, an OpenTelemetry integration, a Helm chart, and a Backstage catalogue entry — all pre-configured, all correct, in under a minute.

sequenceDiagram
    participant Dev as Developer
    participant BS as Backstage Portal
    participant Scaff as Scaffolder Engine
    participant GH as GitHub
    participant CI as CI Pipeline

    Dev->>BS: Fill template form (name, team, language)
    BS->>Scaff: Execute Software Template
    Scaff->>GH: Create repo with golden-path files
    Scaff->>CI: Trigger first pipeline run
    Scaff->>BS: Register service in catalogue
    CI-->>Dev: Green build + first deploy complete

Source: Backstage Software Templates documentation and Onboarding Software to Backstage — describes the template YAML structure and how golden paths are implemented as Scaffolder templates.


The Netflix Paved Road Model

Netflix’s model differs from Spotify’s in one fundamental way: the explicit coupling between the paved road and the “full-cycle developer” operating model. Netflix developers own their services end-to-end — design, implementation, deployment, on-call. This is only viable if the platform absorbs the undifferentiated heavy lifting that full-cycle ownership would otherwise require.

The Netflix paved road is a set of centrally-built, centrally-supported platform services that every Netflix microservice uses by default:

Service What it provides Why it’s on the paved road
Spinnaker Multi-cloud deployment pipelines (canary, blue-green, rollback) Teams cannot safely own production deployments without a reliable rollback mechanism; Spinnaker makes canary analysis automatic
Titus Container scheduling on EC2 Abstracts away AMI management, cluster sizing, and workload placement so teams think in containers, not VMs
Atlas Dimensional time-series metrics store If every team exports metrics in the same format to the same store, SRE can write org-wide dashboards and alerts; without it, each team has a private metrics silo
Spectator Client-side metrics library for Atlas Ensures the contract between service metrics and Atlas is uniform; the library is the paved road for instrumentation
Mantis Streaming analytics and operational data queries Enables real-time debugging without forcing each team to build their own streaming pipeline

The paved road enables “freedom and responsibility” — Netflix’s cultural principle — at scale. Teams have radical autonomy in how they build services; the paved road constrains how they operate them. The freedom is in business logic; the responsibility is fulfilled by staying on the road.

Yunong Xiao’s 2017 QCon talk framed this precisely: the paved road standardises service discovery, configuration management, metrics, logging, and RPC — the same five things every microservice needs, the five things no team should be re-inventing. Everything above that is the team’s domain.

Source: Full Cycle Developers at Netflix (Greg Burrell, Netflix TechBlog, 2018) — defines full-cycle developers and articulates the “operate what you build on top of the paved road” model. The Paved PaaS to Microservices at Netflix (Yunong Xiao, QCon NY 2017) — first public articulation of the Netflix paved road architecture.


The Mechanics — How Paving Actually Works

A golden path is delivered through three distinct layers: scaffolding at service creation time, reusable pipeline components at CI/CD time, and libraries or sidecars at runtime. Confusing these layers leads to paving that only covers one phase and leaves teams unsupported in the others.

graph TD
    A[Golden Path Delivery] --> B[Scaffolding Time\nnew service creation]
    A --> C[CI/CD Time\nbuild and deploy pipeline]
    A --> D[Runtime\nrunning service]

    B --> B1[Backstage Software Templates\nCookiecutter / Copier / Projen\nDefault-rich starter repos]
    C --> C1[Composite GitHub Actions\nTekton Tasks\nArgoCD ApplicationSets]
    D --> D1[Shared Libraries / SDKs\nSidecar containers\nInit containers]

Scaffolding Templates

The scaffolding layer creates the initial project with correct defaults baked in. Tool choice matters:

Tool Model Best for Ongoing update support
Backstage Scaffolder YAML manifest + actions, portal-driven Teams with Backstage already deployed; multi-step workflows (create repo, register service, notify Slack) Via template version bumps in Backstage
Copier Jinja2 templates + YAML config; supports copier update Polyglot orgs; lifecycle management after initial scaffold Yes — copier update syncs template changes into existing repos
Projen Configuration-as-code (TypeScript class); synthesises files; forcibly overwrites manual edits Strict compliance orgs; AWS CDK shops; when drift is unacceptable Yes — re-run projen to re-synthesise
Cookiecutter Jinja2 templates; no update mechanism One-shot scaffolding; widely understood No — initial scaffold only
Yeoman JS generator plugins; browser-era tool Legacy projects; some web frameworks have generators No

The modern trend (2024–2025) is Copier or Projen over Cookiecutter for new platform investments — both support lifecycle management, which Cookiecutter does not.

Default-Rich Starter Repos

The template itself must wire in non-negotiables at scaffold time. A backend service golden path template should produce a repository that already has:

  • Multi-stage Dockerfile (build + minimal runtime image)
  • GitHub Actions workflows for lint, test, build, push, and deploy
  • OpenTelemetry SDK configured with the org’s collector endpoint
  • External Secrets Operator manifest pointing at the org’s secret store (GCP Secret Manager, HashiCorp Vault)
  • ServiceMonitor for Prometheus scraping
  • Helm chart (or ArgoCD Application manifest) with resource limits, liveness/readiness probes, and Pod Disruption Budgets pre-populated
  • SLO definition file in the organisation’s agreed format
  • CODEOWNERS, dependabot.yml, and SECURITY.md

The quality bar: merge the PR that the template creates, and the first deployment runs green without any additional configuration.

Reusable Pipelines

After scaffold time, the paved road continues in the CI/CD layer. Platform teams publish reusable pipeline components that delivery teams call rather than write:

1
2
3
4
5
6
7
8
9
# .github/workflows/deploy.yml — Golden path: call the composite action, don't re-implement
jobs:
  deploy:
    uses: platform-org/actions/.github/workflows/deploy-to-gke.yml@v2
    with:
      image: $
      environment: production
      cluster: prod-eu-west-1
    secrets: inherit

This pattern (GitHub Actions reusable workflows / composite actions) means the platform team can update the deployment logic — add a new security scan step, change the canary percentage — without requiring every service to update their own pipeline files. The services call the platform’s workflow; the platform owns the implementation.

ArgoCD ApplicationSets serve the same role for GitOps delivery: the platform defines the ApplicationSet template that generates individual Application objects for each service, ensuring all services follow the same sync policy, health checks, and notification hooks.

Library vs. Sidecar

At runtime, the paved road can be delivered either as a shared library (SDK) that services import, or as a sidecar container that the platform injects. The tradeoff:

Approach Pros Cons Use for
Shared library / SDK Direct performance; rich API; works without service mesh Language-specific; teams must keep dependency updated; version drift Metrics clients (Spectator), circuit breakers, auth clients
Sidecar Language-agnostic; platform controls rollout; no code change required Extra latency hop; resource overhead; debugging is harder Service mesh (Envoy), log shipping, distributed tracing (OTEL collector)
Init container One-time setup; no runtime overhead Not for ongoing concerns Secret injection, certificate rotation bootstrapping

Netflix’s Spectator is the canonical library example — a metrics client that every Netflix service imports, ensuring Atlas receives data in a consistent format. Envoy as a sidecar is the canonical sidecar example — injected by Istio, language-agnostic, platform-owned rollout.


Multiple Paved Roads — When to Fork

A single golden path works until language or runtime diversity makes it unworkable. A Java Spring Boot path cannot serve a Python FastAPI service without stretching into incoherence. At some point the platform team must decide: fork into multiple paths, or hold the line.

graph TD
    P[Platform Engineering\nCapacity] --> Q{Fork or hold?}
    Q -->|Fork| F1[Java Spring Boot path]
    Q -->|Fork| F2[Python FastAPI path]
    Q -->|Fork| F3[Node.js Express path]
    Q -->|Hold| H[Force polyglot services off-road\nOR restrict new services to\nsupported languages]
    F1 & F2 & F3 --> M[Maintenance cost scales\nwith number of paths]

The tradeoff is maintenance surface: each path must be independently updated when a dependency has a CVE, when the CI platform changes an API, when the org moves from Helm to Kustomize. Three paths means three update cycles. The rule of thumb from practitioners: start with one path, add a second when the off-road volume from a specific runtime exceeds 20% of new services. Below that threshold, the off-road teams own their maintenance; above it, the cost of not paving is higher than the cost of maintaining a second path.

Forking criteria that justify a second path:

  • Different runtime or language that cannot share the same Dockerfile, build toolchain, or test runner
  • Different criticality tier — a tier-1 payment service needs stronger controls (required SLO file, mandatory security scan gate, PagerDuty integration) than an internal tooling service
  • Different deployment target — a serverless Cloud Run service needs a different scaffolding path than a Kubernetes microservice, even in the same language

Never fork for organisational reasons alone (team prefers a different language). That is a path to unmaintainable proliferation.


Anti-Patterns

Platform teams reliably make the same mistakes when building golden paths. Each anti-pattern has a tell.

Paved road as gate. The path is mandatory and teams cannot deploy without passing through it. This converts a productivity tool into a bottleneck and breeds workarounds. The tell: a backlog of “golden path exceptions” requests. Fix: make the path the default, not the only option. Teams going off-road own their stack — do not review or approve their choices, just document the responsibility transfer.

Paved road that hasn’t been paved. The “golden path” is a Confluence page that says “for a new service, do steps 1 through 14.” This is documentation, not a path. The tell: following the path takes more than 30 minutes of manual work. A genuine paved road runs a command or clicks a button; a docs-only path is a to-do list. Fix: invest in automation before calling it a path.

Paving for hypothetical needs. The template includes integrations for capabilities that fewer than 10% of services use, on the theory that teams “might need them later.” The overhead from understanding and configuring those integrations slows down the majority. The tell: the template README has more than 15 configuration knobs. Fix: pave for the median service, not the complex outlier. Addons or follow-on templates can handle the advanced cases.

One-size-fits-all paths. A single Java path for services that range from internal batch jobs (no SLO needed) to tier-1 payment flows (SLO, PCI DSS scan, multi-region failover). The path either under-constrains the tier-1 service or over-constrains the batch job. Fix: explicitly distinguish service criticality tiers and have lightweight vs. heavyweight path variants.

Dogfooding gap. The platform team builds the golden path but doesn’t use it for their own services — they go off-road “because we need more control.” This destroys credibility. Delivery teams notice immediately and conclude the path isn’t production-ready. Fix: the platform team’s own services run on the golden path. If the path is too restrictive for the platform team, it is too restrictive for everyone — fix the path, don’t exempt yourself.

Stale template CVEs. The template hasn’t been updated in six months. The base Docker image has three high-severity vulnerabilities. Trust evaporates the first time a security scan flags the “golden path” itself. Fix: treat template maintenance as a recurring sprint commitment — at minimum, pin dependency bumps to a weekly automated PR (Dependabot / Renovate against the template repo).


Measuring Success

A golden path without metrics is a guess. The platform team should track four things:

Adoption rate. The percentage of new services created via the golden path template in the past quarter. Target: above 80% for the first six months, trending toward 95%+ at maturity. Below 50% means teams are going off-road by default — diagnose why.

Time-to-first-deploy for new services. Median time from “developer runs the scaffolding command” to “first successful production deployment.” Target: under two hours for a new service. If it takes days, the template is incomplete or the supporting infrastructure (ArgoCD, cluster access, secret store) is not self-service.

Template version currency. The percentage of services running the current major version of the golden path template. Stale templates signal either that update tooling doesn’t work (Copier or Projen update flow is broken) or that teams have diverged so far from the template that updates are manually too painful. Target: above 85% within one major version of current.

Off-road support volume. The count of platform team tickets coming from teams that went off-road. This is the cost of off-road usage made visible. An increasing trend signals that off-road paths are becoming unsustainable — teams need either a new golden path or to migrate back. It also signals where the next path should be paved.

The DORA four keys (deployment frequency, lead time, change failure rate, MTTR) measured separately for on-road vs. off-road services is the most persuasive proof point for golden path ROI: teams on the path consistently show higher deployment frequency and lower change failure rates.


Worked Example — Java Spring Boot on GKE Golden Path

This is the skeleton of a production golden path for a Java Spring Boot microservice deploying to GKE. The Backstage Software Template produces a repository with every component listed below pre-configured.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
"""
Backstage Software Template for Java Spring Boot on GKE.

What this scaffolds and why each component is included:

- Dockerfile (multi-stage): separates build (Maven + JDK) from runtime (JRE-only image),
  reducing final image size from ~700MB to ~120MB and shrinking the CVE attack surface.

- GitHub Actions workflow (ci.yml): lint → test → build → push to Artifact Registry →
  trigger ArgoCD sync. Calls platform composite action so pipeline logic is centrally owned.

- OpenTelemetry SDK config (otel.properties): pre-wired to the org's OTEL collector endpoint
  with service.name and service.version attributes populated from Maven POM at build time.
  Teams get traces/metrics/logs from first deploy without writing a line of instrumentation.

- External Secrets Operator manifest (external-secret.yaml): maps GCP Secret Manager paths
  to Kubernetes secrets using the org's standard naming convention. Teams declare which secrets
  they need; ESO handles rotation without pod restarts.

- Helm chart (helm/): includes Deployment, Service, HorizontalPodAutoscaler, PodDisruptionBudget,
  ServiceMonitor (for Prometheus scraping), and NetworkPolicy. Resource requests/limits set to
  org defaults (250m CPU / 256Mi memory request; 1 CPU / 1Gi limit) — teams override in values.

- ArgoCD Application manifest (gitops/app.yaml): registers the service with the org's ArgoCD
  instance, sets sync policy (automated with self-healing, prune enabled), and points at the
  Helm chart in the repo. First deploy is automatic on merge to main.

- SLO definition (slo.yaml): Prometheus-compatible SLO spec with placeholder targets
  (99.5% availability, 200ms P95 latency). Teams fill in actual targets; platform tooling
  generates error budget burn alerts automatically from this file.

- CODEOWNERS: set to the team's GitHub group (provided at scaffold time) so all PRs
  require review from the owning team by default.

- dependabot.yml: weekly Maven and GitHub Actions dependency updates, auto-assigned to
  the owning team. Keeps the service on the paved road's security posture without manual work.

Args:
    service_name: Kebab-case service identifier (e.g. "payments-processor")
    team_slug: GitHub team slug for ownership (e.g. "payments-team")
    gcp_project: Target GCP project for Artifact Registry and Secret Manager
    cluster: Target GKE cluster name
    initial_slo_availability: Starting SLO target (default 99.5%)

Returns:
    GitHub repository initialized with all components, registered in Backstage catalogue,
    ArgoCD application created, first pipeline run triggered.
"""

TEMPLATE_COMPONENTS = {
    "Dockerfile": "multi-stage build; JRE-only runtime image; non-root user",
    "ci.yml": "calls platform/actions/.github/workflows/java-build.yml@v3",
    "otel.properties": "OTEL_EXPORTER_OTLP_ENDPOINT, service.name, service.version",
    "external-secret.yaml": "maps GCP Secret Manager to K8s secrets via ESO",
    "helm/": "Deployment + HPA + PDB + ServiceMonitor + NetworkPolicy",
    "gitops/app.yaml": "ArgoCD Application with automated sync + self-healing",
    "slo.yaml": "availability + latency SLO targets; burn-alert generation hooked",
    "CODEOWNERS": "team GitHub group set at scaffold time",
    "dependabot.yml": "weekly Maven + Actions updates",
    "catalog-info.yaml": "Backstage catalogue registration with owner, system, lifecycle",
}

The template YAML that Backstage Scaffolder executes looks like this in abbreviated form:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# backstage/templates/java-spring-gke/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: java-spring-gke
  title: Java Spring Boot Microservice on GKE
  description: Golden path for a production Java Spring Boot service on GKE
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service details
      required: [service_name, team_slug, gcp_project, cluster]
      properties:
        service_name:
          type: string
          pattern: '^[a-z][a-z0-9-]{2,62}$'
          description: Kebab-case service name
        team_slug:
          type: string
          description: GitHub team for CODEOWNERS
        gcp_project:
          type: string
        cluster:
          type: string
          enum: [prod-eu-west-1, prod-us-east-1, staging]

  steps:
    - id: fetch
      name: Fetch golden path template
      action: fetch:template
      input:
        url: ./skeleton
        values:
          service_name: $
          team_slug: $
          gcp_project: $
          cluster: $

    - id: publish
      name: Create GitHub repository
      action: publish:github
      input:
        repoUrl: github.com?owner=my-org&repo=$
        defaultBranch: main
        teams:
          - $

    - id: register
      name: Register in Backstage catalogue
      action: catalog:register
      input:
        repoContentsUrl: $
        catalogInfoPath: catalog-info.yaml

This is the complete scaffolding-time path. Within two minutes of filling in the form, the team has a repository, a CI pipeline running, an ArgoCD application created, and a service entry in the catalogue — none of which they configured themselves.


How Real Systems Use This

Spotify — Engineering Productivity Squad

Spotify has built golden paths since 2014, starting with a single backend engineering path created in a hack week. The scale by 2020: 300+ engineering teams, 2,000+ engineers, paths covering backend, frontend, data engineering, data science, ML, and audio processing. The decision criteria for paving a new path: the engineering productivity squad identifies patterns where teams are repeatedly solving the same problem, measures the total lost time across the org, and builds the path when the accumulated friction exceeds the path’s build cost.

The implementation mechanism is Backstage Scaffolder, open-sourced in 2020 and now a CNCF incubating project. Spotify’s internal templates generate repositories with CI pipelines, observability integrations, and catalogue registrations in a single click. The quality bar for a golden path is concrete: a developer new to Spotify should be able to follow the tutorial and have a production-ready service deployed without asking anyone for help. If that bar is not met, the path is not published.

The cultural contract at Spotify is also explicit: the golden path is the opinionated answer, and the platform team provides full support for services on it. Teams that diverge are not blocked, but the support model changes. This opt-in contract is what distinguishes Spotify’s golden paths from top-down mandates that create compliance culture.

Netflix — Paved Road and Full-Cycle Developers

Netflix’s paved road was designed for a specific operating model: full-cycle developers who own their services end-to-end, from design to production on-call. This is only viable at Netflix’s scale because the paved road absorbs the operational complexity that would otherwise overwhelm individual teams.

The five paved-road services cover the five things every microservice needs without differentiation: Spinnaker (deployment), Titus (container scheduling), Atlas (metrics), Spectator (metrics client), and Mantis (streaming analytics). Teams building on the paved road inherit battle-tested deployment pipelines with automatic canary analysis, a container scheduler that handles bin-packing and spot instance failover, and a metrics system that every SRE dashboard queries.

Netflix’s explicit framing is “freedom and responsibility”: teams have radical freedom in service design and business logic; the responsibility is fulfilled by staying on the paved road for operational concerns. A team that builds a custom deployment pipeline or a bespoke metrics system takes full responsibility for operating it — without platform team support and without the implicit reliability guarantees that paved-road services carry. Most teams don’t go off-road because the paved road is genuinely better, not because they are forbidden.

Source: Full Cycle Developers at Netflix (Greg Burrell, 2018) and Titus open-source announcement.

Funda — Golden Paths for Developer Time Recovery

Funda, the Dutch real estate platform, published a case study in 2024 on their golden path implementation. Their primary motivator was not consistency — it was time. Their analysis showed developers were spending significant hours per sprint on environment setup, CI pipeline debugging, and onboarding new services. The golden path investment goal was to return that time to feature development.

Their implementation: Backstage Software Templates for three paths (frontend, backend API, data pipeline), each backed by a template repository maintained by the platform team. After rollout, their measured result was a reduction in new service bootstrap time from approximately two days to under two hours. The secondary effect was observability consistency: because every service on the path ships with the same OpenTelemetry configuration, their platform team could write generic dashboards and alert rules that applied org-wide.

The lesson from Funda’s case: the golden path’s value case to leadership is not abstract (consistency, security posture) — it is concrete time recovered and given back to product teams.

Adevinta — Common Platform with Explicit SLOs

Adevinta’s Common Platform serves 50+ marketplace product teams across Europe (Spain, France, Germany, Norway). Their golden path strategy is notable for two things: explicit criticality tiers (each path variant has a defined SLO that the platform team is responsible for meeting), and a published API contract between the platform team and delivery teams.

The API contract distinguishes what the platform team guarantees (template correctness, CI pipeline reliability, cluster SLO) from what delivery teams own (service business logic, application-level SLOs, on-call rotation). This explicit contract prevents the common failure mode where delivery teams assume the platform is responsible for everything, or conversely where the platform team’s scope creep into application concerns reduces delivery team autonomy.

The criticality tier model solves the one-size-fits-all problem: internal tooling services use a lightweight path (no PagerDuty integration required, lower resource limits, relaxed security scan gates); consumer-facing services use a full-weight path with mandatory SLO files, strict vulnerability scan gates, and automatic canary deployment enforced by the pipeline.


References

This post is licensed under CC BY 4.0 by the author.