Post

Backstage and Developer Portals

The developer portal is the front door, ownership registry, and golden-path launcher that makes everything else discoverable. Backstage established the pattern; the question is whether to operate it or use a commercial alternative.

Backstage and Developer Portals

The developer portal is not a convenience layer on top of CI/CD — it is the front door, the ownership registry, and the golden-path launcher that makes everything else discoverable. Backstage established the pattern; the question is whether your organisation has the engineering capacity to operate it, or whether a commercial alternative gets you 80% of the value with 20% of the maintenance burden.


Core Properties

Property Value
Origin Spotify internal tool (~2016), open-sourced March 2020
CNCF status Incubating (accepted Sept 2020, promoted March 2022)
Language TypeScript / React (frontend) + Node.js (backend)
Primary database PostgreSQL (SQLite for local dev only)
Plugin count 230+ official + community plugins (as of 2024)
Public adopters 3,400+ organisations, 270+ publicly listed
Core features Software Catalog, Scaffolder, TechDocs, Search, Plugins
Hosting options Self-hosted (K8s), Roadie (managed), Spotify Portal (commercial)

When to Use / Avoid

Use When

  • You have 50+ engineers across multiple teams where service ownership is becoming opaque — “who owns this?” is a daily question.
  • You want a catalogue-first foundation before investing in golden-path templates and runbooks — start with just catalog-info.yaml files.
  • Your team has at least 2-3 TypeScript/React engineers willing to own the portal as a product, not a side project.
  • You need deep customisation through plugin development — org-specific workflows, custom scorecards, internal tooling embedded in the portal.
  • You are already operating in a CNCF/Kubernetes-native ecosystem and want integration with ArgoCD, Kubernetes dashboards, GitHub Actions, PagerDuty.

Avoid When

  • Your team is under 30 engineers — the catalogue won’t pay for its operational overhead yet.
  • You have no TypeScript capacity — Backstage is fundamentally a TypeScript/React monorepo that you fork and maintain. Without that skill, it becomes a perpetual project.
  • You need something live in 4 weeks — Backstage takes 6-12 months to reach genuine production quality with real coverage. Use Port or OpsLevel for fast time-to-value.
  • Your primary need is service maturity scoring / DORA metrics — Cortex or OpsLevel solve this more directly without the operational weight.
  • You want a no-code configuration model — Backstage’s entire surface area is code.

Backstage Architecture

Backstage is not an off-the-shelf application. It is a framework — a React + Node.js monorepo you fork, customise, and operate. This distinction matters enormously when planning adoption.

graph TD
    Browser[Browser / Developer] --> FE[Frontend — React SPA]
    FE --> BE[Backend — Node.js]
    BE --> DB[(PostgreSQL)]
    BE --> P1[Catalog Plugin]
    BE --> P2[Scaffolder Plugin]
    BE --> P3[TechDocs Plugin]
    BE --> P4[Search Plugin]
    BE --> P5[Custom Plugins]
    P1 --> EP[Entity Providers<br/>GitHub / GitLab / AWS / Custom]

The frontend is a React single-page application composed of plugin UI contributions — each plugin registers its own routes, sidebar items, and entity tabs. The backend is a Node.js application that hosts plugin backends via the New Backend System (post-v1.24, 2023): plugins are declared via backend.add() rather than being wired together manually. The Plugin Registry is the mechanism that makes this composable.

Crucially, Backstage does not manage infrastructure — it is a metadata layer. It reads catalog-info.yaml files from your SCM, surfaces relationships, and provides a UI for navigating and acting on them. The actual infrastructure runs elsewhere.

Source: Backstage Architecture Overview — frontend/backend plugin composition and the New Backend System module pattern.

The New Backend System (post-2023)

Pre-2023, wiring up Backstage backend plugins required significant boilerplate — manual service injection, bespoke registration code. The New Backend System unified this into a declarative pattern:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import { createBackend } from '@backstage/backend-defaults';

/**
 * Minimal Backstage backend setup using the New Backend System.
 * Each backend.add() call registers a plugin or module.
 * No manual dependency injection needed — the framework resolves it.
 */
const backend = createBackend();

backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(import('@backstage/plugin-catalog-backend-module-github'));
backend.add(import('@backstage/plugin-scaffolder-backend'));
backend.add(import('@backstage/plugin-techdocs-backend'));
backend.add(import('@backstage/plugin-search-backend'));
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-guest-provider'));

backend.start();

Extension points let plugins expose customisation hooks (custom Scaffolder actions, custom catalog processors) without requiring consumers to fork plugin internals.


The Software Catalog — The Actual Product

The Software Catalog is not just a feature — it is the control plane that everything else in Backstage depends on. Scaffolder templates, TechDocs, permissions, and third-party plugins all reason over catalog entities. Stale or incomplete catalog data silently undermines every other investment.

Entity Kinds

graph LR
    Domain --> System
    System --> Component
    System --> Resource
    Component --> API
    User --> Group
    Group -->|ownsAll| Component
    Group -->|ownsAll| API
Kind What it represents Key spec fields
Component A deployable unit — service, website, library, ML model type, lifecycle, owner, system
API An interface contract — OpenAPI, gRPC, GraphQL, AsyncAPI type, lifecycle, owner, definition
Resource Infrastructure — databases, S3 buckets, GKE clusters type, owner, system
System A collection of related components + resources owner, domain
Domain A business domain grouping systems owner
User An individual person profile, memberOf
Group A team or org unit type, parent, children, members

The catalog-info.yaml Model in Depth

Every entity is declared in a catalog-info.yaml file, co-located with the source code it describes. The format deliberately mirrors Kubernetes manifests.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: Handles checkout payment processing via Stripe and Adyen
  annotations:
    # Links Backstage to the GitHub repo for auto-discovery
    github.com/project-slug: acme/payment-service
    # Links TechDocs build to this entity
    backstage.io/techdocs-ref: dir:.
    # Links to PagerDuty service for on-call info
    pagerduty.com/service-id: P1234AB
  tags:
    - payments
    - critical-path
  links:
    - url: https://grafana.acme.com/d/payments
      title: Payments Dashboard
      icon: dashboard
spec:
  type: service
  lifecycle: production        # sandbox | development | production | deprecated
  owner: group:payments-team
  system: checkout-system
  dependsOn:
    - component:order-service
    - resource:payments-postgres-db
  consumesApis:
    - stripe-payment-api
    - adyen-payment-api
  providesApis:
    - internal-payments-api

The relations section in a fully processed entity looks like this after the catalog processes it:

  • ownedBypayments-team owns this component
  • partOf — this component is part of checkout-system
  • consumesApi — this component calls stripe-payment-api
  • dependsOn — runtime dependency on order-service

Why the model is the actual product: Once catalog-info.yaml files exist across all services with accurate owner, lifecycle, system, and dependency data, the following become possible automatically: blast-radius analysis for incidents, change-owner notifications, deprecated-dependency detection, and permission policies scoped to ownership. Without catalog quality, none of these work.

Source: Backstage Descriptor Format — full entity kinds and relations specification.

Entity Providers: Keeping the Catalog Fresh

Static catalog-info.yaml registration works for bootstrapping but does not scale. Entity providers run on a schedule and pull entities automatically:

  • GitHub Discovery — crawls a GitHub org for all repos containing catalog-info.yaml, registers them automatically. Supports filtering by visibility and archival status.
  • GitLab Discovery — same pattern; supports GitLab groups and subgroups.
  • AWS S3 — discovers entities from S3 bucket prefixes.
  • Custom providers — for internal registries, CMDB systems, cloud asset inventories.

The key operational question is: where does the truth live? The answer should always be “in the repo, in catalog-info.yaml, owned by the team that owns the service.” Providers that pull from external CMDBs or spreadsheets create a second source of truth and tend to drift.


Core Features

Software Templates (Scaffolder)

The Scaffolder is Backstage’s golden-path materialiser. A template defines the steps to create a new service: scaffold a repo from a skeleton, create a catalog-info.yaml, set up CI, configure secrets, register the entity in the catalog. Templates are YAML with embedded actions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: python-service-template
  title: Python Microservice
  description: Scaffolds a new Python service with FastAPI, CI, and observability defaults
spec:
  owner: platform-team
  type: service
  parameters:
    - title: Service Information
      required: [name, owner]
      properties:
        name:
          title: Service Name
          type: string
          pattern: '^[a-z][a-z0-9-]*$'
        owner:
          title: Owner Team
          type: string
          ui:field: OwnerPicker
  steps:
    - id: fetch-template
      name: Fetch Skeleton
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: $
    - id: publish
      name: Create GitHub Repo
      action: publish:github
      input:
        repoUrl: github.com?owner=acme&repo=$
    - id: register
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: $
        catalogInfoPath: /catalog-info.yaml

The Scaffolder enforces catalog discipline at creation time — every new service starts with a valid catalog-info.yaml including owner, lifecycle, and system placement. This is the single highest-leverage thing you can do to improve catalog quality.

TechDocs

TechDocs implements docs-as-code: Markdown files live in the service repo alongside code, an MkDocs pipeline generates static HTML, and Backstage serves it in context alongside the catalog entity. Documentation is versioned with the service, not floating in a wiki that drifts.

The pipeline: markdown in repo → TechDocs builder (MkDocs) → static files published to cloud storage (GCS / S3) → TechDocs plugin serves them in the portal.

Backstage’s pluggable search indexes catalog entities, TechDocs pages, and any plugin that registers a search collator. The default backend uses Lunr (in-memory, no external dep) for development; production deployments should switch to Elasticsearch or OpenSearch for adequate performance above ~10K entities.

Plugins

The plugin ecosystem is where Backstage’s value compounds. A plugin is a frontend + backend pair — frontend registers UI contributions (entity tabs, sidebar items, cards), backend provides API endpoints. Notable plugins with direct production value:

Plugin What it adds
Kubernetes Live pod status, rollout progress on entity pages
ArgoCD Deployment sync status, application health per service
GitHub Actions Recent workflow runs surfaced on entity page
PagerDuty On-call schedule, recent incidents per service
Datadog / Grafana Service dashboards embedded in entity view
Lighthouse Automated accessibility audits linked to website entities
Cost Insights Cloud cost surfaced per team / service
Tech Insights Service maturity scorecards (Roadie-pioneered, now OSS)

Backstage vs Commercial Alternatives

flowchart TD
    Q1{Do you have TypeScript<br/>engineers to own it?} -->|Yes| Q2{Need deep custom<br/>plugins or white-label?}
    Q1 -->|No| COM[Commercial portal<br/>Port / OpsLevel / Cortex]
    Q2 -->|Yes| BS[Self-hosted Backstage]
    Q2 -->|No — but want OSS base| RD[Roadie — hosted Backstage]
    COM --> Q3{Primary need?}
    Q3 -->|Fast onboarding<br/>no-code data model| PORT[Port]
    Q3 -->|Service maturity<br/>scorecards| CX[Cortex]
    Q3 -->|Lightweight catalog<br/>+ quick DX| OL[OpsLevel]
Dimension Backstage (self-hosted) Roadie Port Cortex OpsLevel
Setup time 6–12 months to production quality 2–4 weeks 3–6 months 4–8 weeks 4–6 weeks
Engineering cost 4+ FTE engineers ongoing ~0.5 FTE 1–2 FTE ~0.5 FTE ~0.5 FTE
Customisation Unlimited (plugin code) High (plugin UI, no infra) High (Blueprints, no-code) Medium Medium
Data model Fixed entity kinds + custom extensions Same as Backstage Fully custom Blueprints Service-centric Service-centric
Scorecards Via Tech Insights plugin Yes (Tech Insights) Yes First-class feature Yes
Scaffolding Yes (Scaffolder) Yes Yes (self-service actions) No Yes (Actions)
Price (200 devs) ~$150K/yr engineering time ~$52K/yr ($22/dev/mo) ~$72K/yr ($30+/dev/mo) ~$156K/yr ($65/dev/mo) Lower
Lock-in risk None (OSS) Low (Backstage-based) High (proprietary) High (proprietary) High (proprietary)
Best fit Large eng orgs with platform team Mid-size, want OSS without ops Fast onboarding, no-code Maturity scoring focus Simple catalog + quick wins

Source: Tasrie IT: Port vs Backstage vs Cortex comparison (2026) and Roadie: 7 Best Developer Portals — cost estimates, setup timelines, and capability comparison.


Adoption Maturity Curve

The single most common Backstage failure mode is attempting to deploy it with a full plugin suite and custom development on day one before any catalog quality exists. The model that works:

graph LR
    P1[Phase 1<br/>Catalog only] --> P2[Phase 2<br/>Scaffolder]
    P2 --> P3[Phase 3<br/>TechDocs + Plugins]
    P3 --> P4[Phase 4<br/>Custom plugins]

    P1 --- N1["catalog-info.yaml in every repo<br/>Entity providers for auto-discovery<br/>Owner + lifecycle + system fields"]
    P2 --- N2["Golden-path templates for new services<br/>Scaffolder enforces catalog standards<br/>Templates create repo + CI + catalog entry"]
    P3 --- N3["TechDocs co-located with code<br/>Kubernetes / ArgoCD / PagerDuty plugins<br/>Tech Insights scorecards"]
    P4 --- N4["Org-specific workflows as plugins<br/>Internal tooling embedded in portal<br/>Cost Insights, custom dashboards"]

Phase 1 is the foundation and the hardest. Getting catalog-info.yaml into every active repo with accurate owner, lifecycle, system, and at least one dependency relation requires a sustained campaign — incentives, automation, PR templates, linting. Without this, Phase 2 and beyond are built on sand.

Phase 2 pays for Phase 1. Once Scaffolder templates exist, every new service starts with a correct catalog-info.yaml. Organic growth fills in the catalog without manual campaigns.

Phase 3 compounds the value. Plugins that surface Kubernetes state, ArgoCD sync status, and PagerDuty incidents in the context of a catalog entity are genuinely useful — developers stop switching between six dashboards. TechDocs co-located with code reduces documentation drift.

Phase 4 is where Backstage outcompetes commercial portals. Custom plugins for org-specific workflows — deployment approvals, internal certificate management, cost attribution, feature flag management — cannot be bought off-the-shelf. This is the moat.


Production Operations

Infrastructure Requirements

A production Backstage instance needs:

  • PostgreSQL — required for all production deployments. SQLite is development-only. Each plugin gets its own schema within a shared Postgres instance (pluginDivisionMode: schema) or separate databases. The Catalog plugin alone can grow to millions of rows at scale.
  • Authentication — OAuth / OIDC backed by your identity provider (Google Workspace, Okta, Azure AD, GitHub). Guest mode is for local dev only.
  • Object storage — required for TechDocs static file hosting (GCS or S3). Without it, TechDocs pages are re-generated on every request.
  • Search backend — Lunr is memory-resident and unsuitable above ~5K indexed documents. Switch to Elasticsearch or OpenSearch early.
  • Container deployment — Backstage runs as a Docker container (frontend served by backend), deployed to Kubernetes. Plan for at least 2 replicas behind a load balancer.

Scaling Characteristics

A Backstage instance becomes performance-sensitive above roughly 15K entities in the catalog. Known scaling failure modes:

  • Provider contention — multiple entity providers running simultaneously cause catalog processing queue saturation. Stagger provider schedules.
  • Identity auth overhead — at 14K+ user entities, IdentityAuthInjectorFetchMiddleware can time out on slow identity provider responses.
  • Multi-region read/write split — at global scale (multiple regions), teams split catalog read and write paths: primary writes to Postgres in one region, read replicas serve catalog queries in others.

Operational Cost Honesty

Running a mature Backstage instance with real adoption is a full-time product. Spotify’s own deployment has a dedicated team. External adopters consistently report needing 2-4 engineers minimum for: plugin upgrades (Backstage releases weekly), catalog quality campaigns, auth configuration, custom plugin development, and user support. The perpetual “we’re rolling out Backstage” syndrome is real — organisations that don’t staff it properly stall at Phase 1 indefinitely.


How Real Systems Use Backstage

Spotify — Origin and Scale

Backstage originated at Spotify around 2016 as an internal tool to solve a specific problem: ~2,000 engineers across ~2,000 microservices, and nobody could answer “who owns this?” or “what depends on this?” in under 5 minutes. The original system unified infrastructure tooling, services, documentation, and team pages into a single coherent frontend. Open-sourced in March 2020 via Stefan Ålund’s blog post “What the Heck Is Backstage Anyway?” — donated to CNCF in September 2020. Spotify reports 99% internal adoption. The internal instance serves as the canonical demonstration of what Phase 4 looks like: custom plugins for internal tooling, golden-path templates that create fully instrumented services, and a catalog that is the authoritative record of every service in production.

Source: Stefan Ålund, What the Heck Is Backstage Anyway?, Spotify Engineering, March 2020

American Airlines — “Runway” in 6 Minutes

American Airlines built their internal developer portal “Runway” on top of Backstage, starting development in May 2020 — almost immediately after Backstage’s open-source release. The headline metric: teams can deploy applications with public ingress in under 6 minutes using Runway’s golden-path templates. The portal covers the airline’s polyglot service estate. This is the case study that demonstrates the Scaffolder’s real-world value — not just discoverability, but accelerated time-to-production via templates that encode the compliance, networking, and CI requirements already.

Expedia Group — 5,000 Developers, 20,000 Services

Expedia Group adopted Backstage in 2020 as the foundation of their Developer Experience platform, scaling it to 5,000+ developers across 15+ brands (Expedia, Hotels.com, Vrbo, etc.) managing approximately 20,000 microservices. Their approach used GitHub Discovery for automatic catalog population across the entire organisation’s repos. Expedia’s scale validates the entity provider approach — manually registering 20,000 entities would be operationally impossible; the GitHub provider crawls and registers continuously. The case study also shows Phase 3 adoption: embedded Kubernetes plugin for pod health, CI/CD status surfaces directly on service pages.

Source: Roadie: Expedia Case Study — GitHub Discovery at 20K+ service scale.

Mercedes-Benz — Enterprise IDP at Automotive Scale

Mercedes-Benz Tech Innovation uses Backstage as the developer portal for their internal platform serving thousands of engineers across automotive software and cloud services adjacent to the MB.OS in-car operating system. Their implementation covers Software Catalog, TechDocs, Scaffolding, and custom plugins specific to the automotive software lifecycle — components that map to vehicle software domains alongside cloud services. The Mercedes-Benz case study is significant for a non-web-native, traditional enterprise context: platform engineering applied where the software estate includes both cloud-native services and automotive embedded software, demonstrating that the catalog model generalises beyond pure web services.

Source: CNCF Case Study: Mercedes-Benz

Netflix — Targeted Plugin Integration

Netflix did not adopt Backstage as its primary developer portal — their internal platform predates it. However, Netflix engineering built an internal Backstage plugin to surface real-time canary analysis results from their Atlas metrics system directly in entity pages. This illustrates the integration pattern for organisations with existing internal platforms: Backstage as the UI aggregation layer, not a wholesale replacement. Netflix’s canary analysis plugin pulls Atlas time-series data and renders it in context of a deployment entity — a workflow that would require significant custom development in any commercial portal.

LinkedIn — Catalog at Professional Network Scale

LinkedIn is a publicly listed Backstage adopter operating at a scale that stresses every component. LinkedIn’s engineering org spans thousands of services across their data-intensive platform (feeds, job matching, search). Their Backstage adoption focuses on the catalog as an ownership and dependency registry — understanding blast radius for their massive service graph, where a single foundational service can have hundreds of downstream dependents. At this scale, the entity graph becomes a critical operational tool during incidents, not just a discovery interface during development.


Anti-Patterns

Vanity portal syndrome. Deploying Backstage because the industry expects it, without a catalog quality campaign. The portal goes live with 40 entities, a handful of plugins, and 5% adoption. Nobody uses it; the platform team claims success by pointing at the deployment.

Stale catalog. Registering services manually without entity providers. Repos get created and never registered; registered services go stale as teams change and ownership shifts. A catalog where 30% of entries have incorrect owners is worse than no catalog — it actively misleads incident responders.

Plugin sprawl. Installing 40 plugins at launch because they are available. Each plugin adds frontend bundle weight and backend processing overhead. Start with 3-5 plugins that solve real daily pain. Earn the right to add more.

Treating Backstage as a CMS. Teams use TechDocs to create elaborate portals full of PDFs and embedded SharePoint links, rather than co-located Markdown. The result is indistinguishable from Confluence — a documentation graveyard, not a developer interface.

Perpetual rollout. The most common failure mode at scale. The platform team announces “we’re rolling out Backstage” and spends 18 months never finishing Phase 1. Root cause: no dedicated team, no adoption incentives, no scorecard measuring catalog completeness, no exec sponsor who cares. The fix is treating catalog completeness as an engineering metric with a named owner and a quarterly target.


References

This post is licensed under CC BY 4.0 by the author.