What Skills a Platform Team Actually Needs
The skill surface of platform engineering is too wide for any team to cover well by treating everyone as a generalist. The teams that deliver build deliberate specialisation with overlapping breadth.
The skill surface of platform engineering — cloud foundation, Kubernetes internals, GitOps, supply chain, observability, developer portals — is too wide for any team to cover well by treating everyone as a generalist. The teams that deliver are the ones that build deliberate specialisation with overlapping breadth: every engineer owns a deep area and can hold the line in a second area when someone is on leave or dealing with an incident.
The Cognitive Surface Is Too Wide
Pick any random Monday morning in a mature platform team and you will find engineers simultaneously reasoning about GCP org policy inheritance, Kubernetes control-plane upgrade rings, Argo CD sync waves, Kyverno admission policy, cosign image verification, OTEL pipeline cardinality, Terraform module APIs, and Backstage plugin architecture. That is eight distinct discipline areas, each with its own vocabulary, failure modes, and operational depth.
A small platform team that arrived here organically — backend engineers who started picking up infrastructure work because no one else would — cannot cover that surface adequately. What happens when they try:
- The engineers who know cloud best get pulled into every IAM or networking incident, and their cluster knowledge atrophies.
- Nobody owns the observability stack end-to-end, so it remains partially adopted indefinitely — the most visible platform value prop stays undelivered.
- Golden-path work never gets finished because the engineer who started it is constantly context-switching to production fires in their broader area.
- Security posture is everyone’s problem and nobody’s ownership. Admission policy, supply chain signing, and cloud-side IAM get patched reactively rather than designed proactively.
According to the CNCF Platform Engineering Maturity Model, Level 2 organisations (“Operationalized”) typically allocate “nearly all technical generalists” to a centralized team that handles diverse domains through constant context-switching. The jump to Level 3 (“Scalable”) is precisely the move from generalists to deliberate specialisation — including adding roles not traditionally found in technical teams, such as product management and user experience.
The Puppet 2024 State of DevOps report found that 52% of respondents consider a product manager “crucial” to the success of a platform team, yet only 21.6% of teams have a dedicated Platform Product Manager as of 2026. The gap between recognising what is needed and actually staffing for it is real — and the technical-skills gap mirrors it.
The solution is not to hire ten specialists. It is to decompose the skill surface into a small number of coherent archetypes — each owning a domain that is internally cohesive, has a natural boundary, and can be staffed by a single engineer going deep — and then use a T-shape coverage model so that no archetype becomes a single point of failure.
The Three Archetypes of a Platform Engineer
Three archetypes carve the platform skill surface cleanly. The boundaries are not arbitrary: each maps to a natural ownership boundary in the infrastructure, a distinct set of tools and vendors, and a distinct failure mode when the expertise is absent.
graph LR
CF[Cloud Foundation\nEngineer]
KE[Kubernetes / Cluster\nEngineer]
DD[Delivery & DevEx\nEngineer]
CF -->|IAM + networking\ninto clusters| KE
KE -->|runtime plane\nfor delivery| DD
CF -->|IaC modules\nas products| DD
Archetype 1 — Cloud Foundation Engineer
Owns: the resource hierarchy, landing zones, IAM at scale, networking, FinOps, and IaC at the organisation level — everything that is true before a single workload is deployed.
| In scope | Out of scope |
|---|---|
| Resource hierarchy (Organisation → Folder → Project), landing zones, project factory | Application-level data modelling |
| Networking: VPC, Shared VPC, Cloud Load Balancing, Private Service Connect, peering, Cloud Armor | Workload service mesh policy (Cluster archetype) |
| Identity & security: IAM, Service Accounts, Workload Identity Federation, Secret Manager, KMS, Binary Authorization | Cluster-side admission policy (Delivery archetype) |
| IaC at org scale: project factory, org policy as code, Terraform module catalogue | Terraform-as-product for delivery teams (Delivery archetype) |
| FinOps: billing, budgets, cost attribution labels, anomaly detection | |
| GKE-as-a-service: cluster creation patterns, Workload Identity on nodes, cluster version lifecycle at the Google Cloud level | Deep cluster internals (Cluster archetype) |
Technology landscape on GCP: Organisation and folder hierarchy, Shared VPC host/service project model, Workload Identity Federation with GitHub OIDC, Cloud Foundation Fabric (FAST stages), hierarchical firewall policies, VPC Service Controls, Binary Authorization, Cloud Billing detailed export, GKE Autopilot provisioning patterns.
AWS equivalent: AWS Control Tower, Organizations, IAM Identity Center (SSO), Landing Zone Accelerator, Service Control Policies, AWS Config, Cost Explorer with tag-based attribution.
Azure equivalent: Management Groups, Azure Landing Zones (ALZ reference architecture), Azure Policy, Microsoft Entra ID, Defender for Cloud.
What expert looks like: The Cloud Foundation Engineer can design a multi-project landing zone from scratch — org structure, Shared VPC topology, firewall policy hierarchy, IAM group-to-role mapping, Workload Identity for service accounts — and encode all of it in Terraform modules that a project factory calls idempotently. They can debug a GCP networking issue by reading VPC flow logs and firewall logs without needing to try random changes. They understand the implications of constraints/iam.disableServiceAccountKeyCreation at the org level and can migrate a service from key-based auth to Workload Identity Federation with zero downtime.
Failure mode without this archetype: IAM sprawl. Every team creates their own service accounts with overly broad roles, keys rotate inconsistently, there is no landing-zone baseline, and a security audit finds 40 separate billing accounts with no attribution. Networking becomes a collection of ad-hoc peerings that nobody fully understands. Cloud spend is unattributed and unowned.
Archetype 2 — Kubernetes / Cluster Engineer
Owns: everything inside and between clusters — control plane, data plane, networking, storage, autoscaling, fleet operations, and cluster-level security. This archetype goes deep on what Kubernetes actually is, not just how to deploy things onto it.
| In scope | Out of scope |
|---|---|
| Control plane internals: kube-apiserver, etcd, controller-manager, scheduler | App-level CRD usage by delivery teams |
| Data plane: kubelet, kube-proxy (or eBPF replacement), containerd | Cross-project cloud networking (Cloud Foundation archetype) |
| CNI: Cilium (preferred for new platforms) or Calico; NetworkPolicy; GKE Dataplane V2 | DBaaS provisioning and migrations (Cloud Foundation archetype) |
| Service mesh: Istio (full sidecar or ambient mode) or Linkerd | Image signing and supply chain (Delivery archetype) |
| Autoscaling: HPA, VPA, KEDA, Cluster Autoscaler, Karpenter (on EKS) | |
| Operators & CRDs: writing and operating controllers | |
| Fleet operations: multi-cluster Services, GKE Fleet/Hub, Cluster API (CAPI), cluster upgrade rings | |
| Cluster-level security: Pod Security Admission, RBAC, NetworkPolicies, Kyverno/OPA Gatekeeper at admission | |
| Storage: CSI drivers, PV/PVC lifecycle, storage classes, GKE Filestore/Persistent Disk CSI |
Technology landscape on GCP: GKE Standard vs Autopilot tradeoffs, GKE Hub Fleet for multi-cluster inventory and policy, GKE Dataplane V2 (Cilium-based eBPF), node pool lifecycle, GKE Config Sync for fleet-wide config, Kyverno for admission policy. Cilium 1.17 is the current leading eBPF CNI across cloud providers — GKE Dataplane V2 is Cilium-based, and Azure CNI Powered by Cilium confirms it as the cross-cloud standard for high-performance networking.
What expert looks like: The Cluster Engineer can diagnose a kubelet CrashLoop by reading control-plane audit logs and etcd metrics, not by guessing. They understand the difference between iptables and eBPF dataplanes in terms of latency, observability, and connection tracking at scale. They can write a Kubernetes operator in Go using controller-runtime that reconciles a custom CRD to enforce a fleet policy — not because they’ve done a tutorial, but because they understand the informer/reconciler/work queue pattern cold. They can design an upgrade ring system for a fleet of clusters that gates on SLO burn rather than calendar time.
Failure mode without this archetype: A security patch comes out for a cluster-level vulnerability. Without a Cluster Engineer, each cluster upgrade becomes a separate coordination project across product teams. Network policies are aspirational rather than enforced — the “default deny” policy exists but has so many exceptions nobody knows if it still means anything. Autoscaling is tuned by guesswork and the cluster regularly over-provisions by 40% because nobody owns the VPA recommender output.
graph TD
Fleet[Fleet / Multi-Cluster Layer]
Fleet --> CP[Control Plane\nkube-apiserver · etcd · scheduler]
Fleet --> DP[Data Plane\nkubelet · containerd · CNI]
CP --> Auto[Autoscaling\nHPA · VPA · KEDA · CA]
CP --> Ops[Operator / CRD layer]
DP --> Net[Networking\nCilium / Calico · NetworkPolicy · Service Mesh]
DP --> Sec[Cluster Security\nPSA · RBAC · Kyverno]
Source: GKE Dataplane V2 docs — GKE Autopilot uses Cilium-based eBPF as the default CNI; GKE Standard clusters should enable Dataplane V2 for production workloads requiring L7 network policy and Hubble flow observability.
Archetype 3 — Delivery & Developer Experience Engineer
Owns: the full path from git push to running workload, plus the self-service interface delivery teams use to interact with the platform. This archetype is responsible for the platform’s customer-facing value — golden paths, developer portal, GitOps, supply chain, observability as a managed service, and platform SLOs.
| In scope | Out of scope |
|---|---|
| GitOps: Argo CD (ApplicationSets, Kustomize, Helm, sync waves), Flux as alternative | Deep cluster internals (Cluster archetype) |
| CI/CD: reusable GitHub Actions workflows, Tekton if needed; build/test/sign pipelines | Cloud-side IAM bindings (Cloud Foundation archetype) |
| Scaffolding & golden paths: Backstage Software Templates, Cookiecutter, Copier, Projen | App-level instrumentation choices (delivery teams own this) |
| Developer portal: Backstage (Software Catalog, TechDocs, scaffolder, plugins), Port, Cortex, OpsLevel | |
| Supply chain: Sigstore/cosign (keyless signing), SLSA levels 1–3, SBOM generation (SPDX/CycloneDX), Binary Authorization integration | |
| Policy at admission: Kyverno or OPA/Gatekeeper — operated in conjunction with Cluster archetype | |
| Observability stack as platform service: OpenTelemetry collector pipelines, Prometheus/VictoriaMetrics, Grafana/Loki/Tempo, alerting rules | |
| Platform SLOs: pipeline success rate, deploy time, golden-path TTFC (time-to-first-commit), portal uptime, golden-path adoption % | |
| IaC as product: Terraform module library, versioned releases with READMEs and integration tests |
Technology landscape: Argo CD is the current default for fleet-scale fan-out (ApplicationSet + cluster generator pattern); Flux remains a valid alternative with a stronger GitOps-purity ethos. Backstage remains the open-source default for developer portals; Port, Cortex, and OpsLevel lead the commercial space. For admission policy, Kyverno has the better Kubernetes-native ergonomics (YAML policies, resource generation, image verification via Sigstore integration) and is preferred for new platforms — Gatekeeper remains appropriate when a team already runs OPA across the broader stack. Score (CNCF Sandbox) is the emerging open workload specification standard — worth monitoring, not yet production-mainstream.
What expert looks like: The Delivery & DevEx Engineer can write a Backstage Software Template that scaffolds a new service in the team’s primary stack — with GitHub Actions workflows, an Argo CD Application manifest, Helm chart, and OpenTelemetry auto-instrumentation pre-wired — so that a single command generates something a team can deploy on day one. They can set up cosign keyless signing via GitHub OIDC → Fulcio and enforce verification through a Kyverno ClusterPolicy and Binary Authorization policy simultaneously. They understand the difference between a platform metric (pipeline success rate) and an application metric (error rate) and can define, wire, and publish platform SLOs in tools like Sloth or OpenSLO.
Failure mode without this archetype: The platform’s value is invisible. Cloud infrastructure is harmonised and the clusters are well-operated, but delivery teams still manage their own CI pipelines, cannot self-serve a new service, and have no shared observability stack. Platform adoption stays near zero because there is no “front door.” Observability tooling exists but remains half-onboarded because nobody owns the full pipeline from app instrumentation contract → OTel collector → storage → dashboard template.
Source: CNCF Platforms White Paper — reducing cognitive load on product teams requires packaging platform capabilities into discoverable, self-service interfaces. The portal, golden path, and GitOps delivery layer are that packaging.
The T-Shape Coverage Model
Each engineer picks one deep area and one breadth area across the three archetypes. The deep area is where they are accountable for production outcomes; the breadth area is real working knowledge — enough to hold the line during an incident or while the deep owner is on leave.
graph LR
Eng[Each platform engineer]
Eng -->|deep: production-grade ownership| Deep[1 archetype]
Eng -->|breadth: working knowledge| Breadth[1 other archetype]
Deep --> Cov[Coverage goal:\nevery archetype has\nat least 1 deep + 1 breadth]
Breadth --> Cov
A team has enough archetype coverage when these three rules hold:
| Rule | Why it matters |
|---|---|
| Every archetype has at least one deep owner | The archetype’s roadmap, architecture, and incident response have a single accountable engineer. |
| Every archetype has at least one breadth-level backup | No single point of failure when the deep owner is on holiday, sick, or absorbed by a multi-week incident. |
| The widest archetype carries more depth than the others | Delivery & DevEx is typically widest — GitOps + portal + supply chain + observability + platform SLOs + IaC — and tends to need a second deep engineer to keep all of those capabilities production-grade. |
Distribution rules
No engineer is silo’d. Breadth area means real working knowledge — the goal is to be able to debug a Sev-2 in your breadth area without paging the deep expert. A breadth-level Kubernetes engineer should be able to identify whether a pod crash is caused by an OOM event, an admission webhook denial, or a node-pressure eviction — even if they cannot write a custom controller.
Deep area ownership means production-grade accountability. The deep engineer is the person whose phone rings first for incidents in that area, who owns the architecture decisions, and who drives the roadmap for that capability.
Every area has redundancy. No single person carries an area alone. If the Cluster Engineer is on holiday, the breadth-level engineer holds incidents and escalates only for novel failures.
What Is Deliberately NOT a Separate Archetype
Four capability areas frequently appear in “platform team job descriptions” that do not warrant their own archetype. Treating them as separate roles fragments ownership and creates unnecessary headcount pressure.
| Capability | Why it is not a dedicated archetype | Where it lives |
|---|---|---|
| Databases / caching / streaming | Schema, queries, and hot-path tuning belong to application teams. The platform can provide Terraform modules for Cloud SQL, Memorystore, or Pub/Sub provisioning — but that is a thin slice of the Cloud Foundation archetype, not a discipline requiring deep specialisation. | Cloud Foundation archetype (provisioning modules) + delivery teams (runtime ownership) |
| Security beyond admission | Security naturally splits along the existing boundaries: cloud-side in Cloud Foundation (IAM, KMS, VPC Service Controls), cluster-side in Kubernetes archetype (PSA, RBAC, NetworkPolicy), supply chain in Delivery archetype (cosign, SLSA, SBOM). A separate “security archetype” creates a team within a team and generates a coordination overhead that slows everything down. | All three archetypes, distributed by layer |
| Observability | At platform scale, observability is a product delivered to teams, not a discipline distinct from delivery engineering. The OTel collector pipeline, Prometheus/Grafana stack, and alerting templates are platform capabilities — they live in the Delivery archetype alongside the portal and GitOps. Spinning out a separate “observability engineer” role dilutes the product ownership of the Delivery archetype and makes it harder to ship observability as part of the golden path. | Delivery & DevEx archetype |
| On-call / incident response | Incident response is cross-cutting: a cloud networking incident involves the Cloud Foundation engineer; a cluster crash involves the Kubernetes engineer; a broken pipeline involves the Delivery engineer. Dedicated incident-response roles make sense at much larger scale — for a small platform team, rotating on-call with clear ownership boundaries is the right model. | Soft-skills track; rotating on-call by archetype ownership |
graph TD
Security[Security]
Security --> CF2[Cloud side\nIAM · KMS · VPC SC]
Security --> K2[Cluster side\nPSA · RBAC · NetworkPolicy]
Security --> DD2[Supply chain\ncosign · SLSA · SBOM]
CF2 --> A1[Cloud Foundation\narchetype]
K2 --> A2[Kubernetes\narchetype]
DD2 --> A3[Delivery & DevEx\narchetype]
Cross-Cutting Non-Technical Skills
Every platform-engineering article covers the tools. Almost none covers the skills that determine whether a technically competent platform team is actually used. These are not soft skills in the pejorative sense — they are the difference between a platform that gets mandated and a platform that gets adopted.
Camille Fournier and Ian Nowland’s Platform Engineering (O’Reilly, 2024) argues that the central failure mode of platform teams is not technical incompetence — it is the failure to treat the platform as a product with internal customers. The skills below are what “product thinking” means in practice for an engineering team without a PM.
Product mindset
The platform engineer with a product mindset tracks adoption metrics rather than tickets-closed. They run interviews with delivery teams before building a feature (“we asked three teams what they need from a secret management interface before designing it”) and write a public roadmap that delivery teams can see and comment on. They know how to say no — not because they are gatekeeping, but because they understand the tradeoff between building bespoke solutions and maintaining a coherent platform.
Concretely: a platform team with this skill will notice that golden-path adoption is stagnating at 30% and investigate why, rather than assuming delivery teams are wrong for not using the path. They will discover that the service template generates a Helm chart with hardcoded values that conflict with the staging environment configuration and fix it in two days rather than six months.
Customer collaboration
Running office hours, reviewing delivery team PRs before they become production incidents, and joint planning sessions where the platform roadmap is shaped by delivery-team bottlenecks rather than platform-team preferences. The skill here is listening before solving — which is harder for engineers than it sounds.
Manuel Pais’s framing from Team Topologies and his Platform as a Product talks is precise: the interaction mode should be X-as-a-Service in steady state, but moving there requires a Collaboration phase where the platform team genuinely understands what delivery teams need. You cannot design the right self-service interface without that knowledge.
Source: Manuel Pais, Platform as a Product (PlatformCon 2022) — “The platform team’s job is to understand the needs of the platform audience deeply enough to build things they will actually use without asking.”
On-call discipline
A platform team that does not own incidents on the platform is not a platform team — it is an advisory group. Each archetype owns first-response for incidents in their area. The Cluster Engineer picks up a cluster OOMKill cascade at 2 AM, not the delivery team whose pod happened to be the trigger. The Delivery Engineer owns a broken Argo CD sync, not the team trying to release.
Blameless post-mortems are a technical writing practice as much as a cultural one. A good post-mortem identifies the systemic cause, the timeline with no names blamed, and a concrete action list with owners and deadlines. A team that writes good post-mortems builds trust with delivery teams faster than any feature announcement.
Technical writing
Golden-path documentation, Architecture Decision Records (ADRs), and runbooks are not administrative overhead — they are the product’s user interface. A golden path with no documentation is a path nobody finds. An ADR that explains why the team chose Argo CD over Flux eliminates three months of future re-debate. A runbook that explains how to recover from a broken OTel collector pipeline means the on-call engineer at 3 AM is not improvising.
Good technical writing in a platform context means: write for someone who is smart but new to your platform, be concrete about commands and expected outputs, and never write “see the docs” without linking to them.
Public speaking and evangelism
Internal showcases where platform engineers present what they shipped — not just the feature, but the design decision, the tradeoff, and the customer outcome — build the internal credibility that converts mandatory adoption into willing adoption. Delivery teams that understand why the golden path makes certain opinionated choices are far more likely to work within it rather than around it.
Conference talk submissions (PlatformCon, KubeCon’s co-located platform events) serve a dual purpose: they force the engineer to structure their thinking well enough to explain it to strangers, and they make the team externally visible in a way that attracts good future hires. An internal blog post that becomes a conference talk is a higher-value artifact than another Terraform module that duplicates something upstream already provides.
The 2024 platform engineering maturity data is stark: 25.4% of teams have no product mindset at all. The gap between “technically solid” and “actually adopted” is almost always a soft-skills gap, not a technical one.
Preconditions and Adjacent Concerns Out of Scope
The three archetypes above describe a specific shape of platform — the configuration that has become the modern default. This section names the assumptions that shape requires, and the adjacent platform-engineering concerns that belong to other teams in a mature engineering organisation.
What this article assumes
- Cloud-based infrastructure. Examples are written for Google Cloud Platform (GCP); equivalents in AWS and Azure are called out where the underlying concept differs meaningfully. The skill split itself applies across all three major clouds.
- Kubernetes as the primary workload runtime. Specifically GKE as the example, but the Cluster archetype generalises to EKS, AKS, or self-hosted Kubernetes. If the platform is not Kubernetes-based — serverless-first, VM-based, or mainframe-modernisation focused — the Cluster archetype shrinks dramatically and the Cloud Foundation archetype expands to absorb most of the runtime concerns.
- An IDP that is built, not bought. If the organisation has decided to buy a managed platform (Humanitec, Mia-Platform, hosted Backstage via Roadie), the Delivery archetype shifts from “build the portal and golden paths” to “configure and operate the vendor product” — the skill emphasis changes but the archetype boundary still holds.
What is deliberately out of scope
The concerns below are real platform-engineering work, but they sit next to the three archetypes rather than inside them. In a mature engineering organisation they typically belong to a different team with its own lifecycle, customer model, and skill profile. Folding them into the same platform team that owns Cloud Foundation, Cluster, and Delivery either overloads the team or fragments its focus.
| Adjacent concern | Why it is out of scope here | Who typically owns it |
|---|---|---|
| API platform & API management | API gateways (Kong, Apigee, AWS API Gateway, MuleSoft, WSO2), the catalogue of external-facing APIs, API versioning standards, rate limiting, partner onboarding, and monetisation form a distinct discipline. The customer is external developers and consuming systems, not internal delivery teams — a fundamentally different product model. | API Platform / Integration Platform team |
| Engineering standardisation | Language and framework standards (Java versions, Spring Boot starter libraries, internal SDKs), code review and PR standards, secure SDLC requirements, dependency management at organisation scale, internal coding guidelines. These define what code looks like across the org. The platform team defines what runtime, delivery, and infrastructure look like. The two are complementary; conflating them creates a team that does neither well. | Engineering Excellence / Engineering Standards team |
| Data & ML platform | Data lake, data warehouse, BI tooling, feature store, model training infrastructure, model serving, MLOps pipelines, AI agent runtime. The compute substrate may overlap with the Kubernetes fleet, but the data lifecycle, governance, and consumer model are different enough to warrant a dedicated team once the workload exists. | Data Platform / ML Platform team |
| Identity provider operations | Day-to-day SSO administration, identity federation (Okta, Microsoft Entra ID), SCIM provisioning to SaaS apps, certificate lifecycle for human authentication, joiner-mover-leaver flows. The platform team consumes identity as a service through Workload Identity Federation; it does not run the IdP. | IT / Identity & Access Management team |
| Developer environment tooling | Local developer machines, IDE plugins, devbox / devcontainer setups, language runtimes on laptops, mandatory pre-commit hooks. Sometimes folded into a Developer Productivity team. | Developer Productivity / IT |
When one of these adjacent teams does not exist in the organisation, the platform team often inherits the function temporarily. That is a pragmatic short-term answer, but the skill set required is meaningfully different, and trying to staff for it from within the three-archetype model almost always means one archetype’s quality suffers. The honest answer is either to advocate for a separate team, or to formally expand the platform team’s mandate and headcount with that adjacent function as an explicit fourth area — not to pretend it fits inside the existing three.
References
- 📖 Platform Engineering: A Guide for Technical, Product, and People Leaders — Fournier & Nowland (O’Reilly, 2024) — The canonical argument that platform teams fail not for lack of technical skill but for lack of product thinking. Essential reading for the soft-skills section of any platform engineering role.
- 📖 Team Topologies, 2nd ed. — Skelton & Pais (IT Revolution, 2025) — Source of the Thinnest Viable Platform concept, X-as-a-Service interaction model, and the Collaboration → Facilitating → X-as-a-Service trajectory that underpins the T-shape model.
- 📖 Production Kubernetes — Rosso / Lander / Brand / Harris (O’Reilly, 2021) — Decision frameworks for multi-tenancy, fleet operations, and addon lifecycle that directly inform the Cluster archetype’s scope.
- 📖 Observability Engineering, 2nd ed. — Majors / Fong-Jones / Miranda (O’Reilly, 2026 early release) — Charity Majors on why observability is a culture and practice problem as much as a tooling problem. Reinforces why it belongs inside the Delivery archetype, not as a separate discipline.
- 🔗 CNCF Platform Engineering Maturity Model — Defines the Provisional → Operationalized → Scalable → Optimizing levels. The Level 2→3 transition is the move to deliberate specialisation this document describes.
- 🔗 CNCF Platforms White Paper — Defines what platform capabilities are, how to measure their value, and why reducing cognitive load on product teams requires packaging capabilities into discoverable self-service interfaces.
- 🎥 Manuel Pais — Platform as a Product (PlatformCon 2022) — The definitive 30-minute articulation of why platform teams need product thinking, user research, and adoption metrics rather than just technical excellence.
- 🔗 GKE Dataplane V2 (Cilium-based eBPF) — GCP’s confirmation that Cilium is the production CNI default for Autopilot and the recommended choice for Standard clusters.
- 🔗 Nirmata — Kyverno vs OPA Gatekeeper comparison (2025) — Why Kyverno’s Kubernetes-native YAML policy model wins for new platforms; when Gatekeeper’s Rego model is worth the complexity.
- 🔗 Puppet / Perforce — State of DevOps: The Evolution of Platform Engineering (2024) — 470+ practitioner survey. Key numbers: 52% consider a product PM crucial, only 21.6% have one; 25.4% of teams have no product mindset at all.
- 🔗 Platform Engineering Maturity in 2026 — platformengineering.org — 2026 state-of-the-industry data: 45.5% of teams are dedicated but reactive; 13.1% have reached optimised cross-functional ecosystems.
- 🔗 Cloud Foundation Fabric (GitHub) — Google’s reference Terraform implementation for GCP landing zones (FAST stages). The canonical starting point for any Cloud Foundation archetype building a GCP org structure.