Performance Management

The system by which you create clarity about what good looks like, track progress toward it, and handle the full spectrum from high performers to underperformers. Done right, nobody is ever surprised by their review.

Posted Jul 1, 2025 Updated Apr 25, 2026

11 min read

Key Dimensions

Dimension	What Good Looks Like	Common Failure
Clarity	Every person knows what’s expected at their level	Vague expectations, “you’ll know it when you see it”
Frequency	Continuous feedback, formal check-ins quarterly	Feedback only during annual review cycle
Fairness	Same rubric, calibrated across teams	Manager’s favorites get inflated, quiet performers overlooked
Actionability	Feedback includes specific behaviors and growth path	“Needs to be more senior” with no concrete guidance
Courage	Manager addresses underperformance early	Avoiding hard conversations until it’s a crisis
Documentation	Written record of expectations, feedback, outcomes	Nothing written until the PIP

Goal Setting: OKRs vs. KPIs vs. Goals

OKRs (Objectives and Key Results)

Popularized by Andy Grove at Intel, scaled by Google. The mechanism: set an ambitious Objective (qualitative, inspirational), then define 3-5 Key Results (quantitative, measurable) that would prove you achieved it.

When OKRs work well:

Product and platform teams where outcomes matter more than outputs
When you want to stretch ambition — Google’s “0.7 is success” model encourages moonshots
Cross-functional alignment — OKRs cascade and interlock across teams

When OKRs fail:

OKRs become task lists — “Ship feature X” is not a Key Result, it’s an output. Key Results measure impact: “Reduce checkout abandonment from 68% to 55%”
Too many OKRs — more than 3 objectives per team per quarter means nothing is prioritized
No one checks mid-quarter — OKRs set in January, forgotten until March review
Punishing misses — if missing a stretch OKR hurts performance reviews, people sandbag. Google explicitly decouples OKRs from compensation.

KPIs (Key Performance Indicators)

Better for operational/reliability work where the job is to maintain and improve steady-state metrics.

Use KPIs when:

The team owns an ongoing service (SRE, platform, support)
Success = keeping metrics in a healthy range, not achieving a one-time outcome
You need to track operational health continuously

The hybrid approach (what most mature orgs actually do): OKRs for strategic initiatives (what are we changing this quarter?) + KPIs for operational health (what must we maintain?). Don’t force OKR format on operational work.

Individual goals vs. team goals:

The tricky part for engineering managers. Individual goals feel fair but incentivize local optimization. Team goals encourage collaboration but let underperformers hide.

Recommendation: Team OKRs for what the team delivers + individual growth goals for how each person develops. Amazon does this well: business goals are team-level, but development goals are individual and tracked in 1:1s.

Performance Reviews — Getting Calibration Right

The calibration problem:

Left uncalibrated, manager reviews are normally distributed around “above expectations” — everyone’s above average, which means the scale is meaningless. Calibration is the process of cross-manager alignment so that “exceeds expectations” means the same thing across the org.

How calibration actually works at top companies:

Google: Managers write initial reviews, then calibration committees (skip-level manager + peer managers) review all ratings in a session. Managers must justify ratings with specific examples. Committees redistribute ratings to roughly match the expected distribution. This is uncomfortable but produces fairness.

Netflix: No formal performance ratings. Instead, the “keeper test” — would you fight to keep this person? If not, give them a generous severance. This is radical and works for Netflix’s specific culture (high talent density, high compensation, low job security tolerance). Most companies cannot replicate this because they don’t pay top-of-market to compensate for the insecurity.

Amazon: OLR (Organization-Level Review) is a calibration process where managers rank their org from top to bottom and discuss with peers. Stack ranking by another name, but with nuance — the focus is on “who’s in the wrong role” more than “who’s worst.”

The rating scale debate:

Scale Type	Pros	Cons	Used By
5-point (1-5)	Granular, familiar	Central tendency bias (everyone’s a 3)	Google, Microsoft
3-point (below/meets/exceeds)	Forces differentiation	Not enough resolution for comp decisions	Some startups
No ratings	Reduces politics, focus on growth	Hard to make comp decisions, recency bias	Netflix, Deloitte (tried, partially reverted)
4-point (no middle)	Eliminates “meets expectations” default	Can feel forced	Used in some military-origin systems

My take: 5-point with forced calibration is the least-bad option. You need enough resolution to differentiate for compensation, but you must calibrate or the scale is theater.

The recency bias problem:

Reviews cover 6 or 12 months, but humans remember the last 6 weeks. Countermeasures:

Brag documents — ask each person to maintain a running list of accomplishments (Julia Evans popularized this)
Manager journal — write brief notes after each 1:1 about impact and growth signals
Quarterly check-ins — mini-reviews that create a paper trail for the annual review
Peer feedback collected at multiple points — not just at review time

Managing Underperformance

This is where most managers fail — not because they can’t identify underperformance, but because they avoid the conversation until it’s a crisis.

The underperformance spectrum:

Level	Signal	Response	Timeline
Early drift	Missing deadlines, lower quality, disengaged in meetings	Direct feedback in 1:1, explore root cause	2-4 weeks to see improvement
Consistent underperformance	Pattern over 6-8 weeks, feedback not sticking	Explicit expectations reset, written plan, weekly check-ins	30-60 days
Formal PIP	Written plan failed or issue is severe	HR-involved PIP with clear criteria and timeline	30-90 days
Exit	PIP failed or pattern is irrecoverable	Managed exit with dignity, severance if warranted	Immediate-2 weeks

Before the PIP — the conversation most managers skip:

The formal PIP should never be the first time someone hears they’re underperforming. The sequence should be:

Verbal feedback (1:1): “I’ve noticed X pattern. Here’s what I need to see instead. How can I help?”
Written expectations reset (email/doc after 1:1): “Following up on our conversation. Here’s what success looks like in the next 30 days: [specific, measurable criteria]”
Weekly check-ins on progress with explicit acknowledgment of improvement or continued concern
Formal PIP only if steps 1-3 didn’t resolve it

Common causes of underperformance (and the right response):

Root Cause	Signal	Right Response	Wrong Response
Wrong role	Strong in some areas, failing in others	Explore role change, different team	PIP on their weaknesses
Personal crisis	Sudden drop from previously strong performer	Compassion, temporary load reduction, EAP referral	“Your performance is slipping” (without asking why)
Skill gap	Willing but unable	Training, pairing, mentoring	Waiting for them to figure it out
Motivation loss	Capable but checked out	Explore what’s changed — boredom? conflict? comp?	Assuming laziness
Bad fit	Cultural mismatch, values conflict	Honest conversation about fit, managed exit	Trying to “fix” them
Manager failure	Unclear expectations, no feedback, no support	Fix your management first	Blaming the report

The PIP document:

A good PIP is compassionate in intent and ruthless in clarity:

Specific deficiencies — “In the last 60 days, you missed 3 of 5 sprint commitments and delivered code that required significant rework on PRs #142, #156, and #171” (not “your performance is below expectations”)
Clear success criteria — “Over the next 30 days, you will: (a) complete assigned sprint items at 80%+ rate, (b) have no more than 1 PR require rework for quality issues, (c) proactively communicate blockers within 24 hours”
Support provided — “I will: pair you with [senior engineer] for daily 30-min pairing sessions, review your PRs within 4 hours, meet weekly to discuss progress”
Consequences — “If these criteria are not met by [date], we will proceed with separation”
Timeline — 30 days for performance, 60 days for behavioral issues, 90 days only if the person is showing genuine improvement and needs more time

The ethical dimension:

A PIP should be a genuine attempt to help someone succeed, not a paper trail for termination. If you’ve already decided to fire someone, don’t waste their time with a fake PIP — have the exit conversation directly. Using a PIP as legal cover while having no intention of keeping them is dishonest and they always know.

Stack Ranking — The Debate

Stack ranking (forced distribution of performance ratings) was popularized by Jack Welch at GE (“rank and yank” — bottom 10% managed out annually). Microsoft famously used it for decades before abandoning it in 2013.

Why it fails at scale:

Destroys collaboration — if my success requires your failure, why would I help you?
Punishes great teams — a team of 10 strong performers must still label 1-2 as “underperformers”
Gaming — managers hoard low performers to sacrifice, or hire someone specifically to fill the bottom slot
Retention inversion — strong performers who see teammates unfairly labeled leave; weak performers protected by the curve stay

What to do instead:

Calibrate without forced distribution — discuss all employees in a group, but don’t require a bell curve
Focus on growth trajectory — is this person growing, plateaued, or declining? More useful than a ranking
Separate evaluation from development — evaluation is backward-looking (what happened), development is forward-looking (what’s next). Don’t try to do both in one conversation
Use relative assessment sparingly — when making promotion decisions, relative comparison is useful. For routine reviews, absolute assessment against level expectations is better

High Performers — The Neglected Risk

Most managers spend 80% of their performance management energy on underperformers and neglect their top performers. This is backwards.

What high performers actually need:

Need	What It Looks Like	What Happens If You Ignore It
Challenge	Stretch assignments, new problem domains	They get bored and leave
Recognition	Specific, public acknowledgment of impact	They feel invisible and leave
Autonomy	Trust to make decisions, less oversight	They feel micromanaged and leave
Growth path	Clear next role, skill development plan	They see no future and leave
Compensation	At or above market, equity refresh, spot bonuses	They get poached and leave
Shielding	Protection from organizational noise	They burn out on politics and leave

The “quiet high performer” problem:

In a team of 16, you likely have 2-3 people who consistently deliver great work without drama. They don’t ask for recognition, they don’t complain, they just execute. These are your highest-risk retention targets because you’ll take them for granted until they hand in their resignation.

Countermeasure: Proactively schedule career conversations with your top performers every quarter. Don’t wait for them to bring it up. Ask: “What would make you start looking elsewhere?” and “What’s the most exciting thing you could be working on?”

Performance Conversations — The Mechanics

The SBI-I framework for performance feedback:

Situation: “In last Tuesday’s design review…”
Behavior: “…you interrupted the junior engineer three times when they were presenting their approach…”
Impact: “…which made them visibly uncomfortable and less likely to share ideas in the future.”
Intent/Inquiry: “I don’t think that was your intent. What was going on for you in that moment?”

The “no surprises” principle:

If someone is surprised by their performance review, you failed as a manager — not them. Every piece of feedback in the formal review should have been discussed in 1:1s already. The review is a summary, not a reveal.

Annual review writing tips:

Lead with impact, not activity — “Led the migration of 3 services to K8s, reducing deployment time from 45 min to 8 min” not “Worked on Kubernetes migration”
Be specific about growth — “Improved significantly in stakeholder communication, specifically in how they present technical tradeoffs to product — visible in the Q3 roadmap discussion” not “Improved communication skills”
Address development areas with growth framing — “Next growth edge is learning to delegate more effectively — currently tends to take on too much personally, which limits their team’s development”
Calibrate your language — words like “adequate,” “satisfactory,” and “acceptable” all read as negative, even if you mean them neutrally

References

Books

High Output Management — Andy Grove (performance reviews as a manager’s primary output)
Radical Candor — Kim Scott (the 2x2 of caring personally / challenging directly)
An Elegant Puzzle — Will Larson (systems for performance management at scale)
Measure What Matters — John Doerr (OKRs — the canonical reference)
Nine Lies About Work — Marcus Buckingham (challenges rating scales and annual reviews)
The Hard Thing About Hard Things — Ben Horowitz (on firing, PIPs, and difficult conversations)

Research & Articles

“Reinventing Performance Management” — Deloitte/HBR (2015) — the case for replacing annual reviews
Google re:Work — open-source calibration and review guides
Julia Evans — “Brag Documents” (blog post, practical advice for self-evaluation)
“The Keeper Test” — Netflix culture memo (radical approach to performance)

Talks

Patty McCord — “Powerful: Building a Culture of Freedom and Responsibility” (Netflix HR philosophy)
Kim Scott — Radical Candor talks (multiple versions, all worth watching)

Engineering Leadership

This post is licensed under CC BY 4.0 by the author.