Post

Performance Management

The system by which you create clarity about what good looks like, track progress toward it, and handle the full spectrum from high performers to underperformers. Done right, nobody is ever surprised by their review.

Performance Management

Key Dimensions

Dimension What Good Looks Like Common Failure
Clarity Every person knows what’s expected at their level Vague expectations, “you’ll know it when you see it”
Frequency Continuous feedback, formal check-ins quarterly Feedback only during annual review cycle
Fairness Same rubric, calibrated across teams Manager’s favorites get inflated, quiet performers overlooked
Actionability Feedback includes specific behaviors and growth path “Needs to be more senior” with no concrete guidance
Courage Manager addresses underperformance early Avoiding hard conversations until it’s a crisis
Documentation Written record of expectations, feedback, outcomes Nothing written until the PIP

Goal Setting: OKRs vs. KPIs vs. Goals

OKRs (Objectives and Key Results)

Popularized by Andy Grove at Intel, scaled by Google. The mechanism: set an ambitious Objective (qualitative, inspirational), then define 3-5 Key Results (quantitative, measurable) that would prove you achieved it.

When OKRs work well:

  • Product and platform teams where outcomes matter more than outputs
  • When you want to stretch ambition — Google’s “0.7 is success” model encourages moonshots
  • Cross-functional alignment — OKRs cascade and interlock across teams

When OKRs fail:

  • OKRs become task lists — “Ship feature X” is not a Key Result, it’s an output. Key Results measure impact: “Reduce checkout abandonment from 68% to 55%”
  • Too many OKRs — more than 3 objectives per team per quarter means nothing is prioritized
  • No one checks mid-quarter — OKRs set in January, forgotten until March review
  • Punishing misses — if missing a stretch OKR hurts performance reviews, people sandbag. Google explicitly decouples OKRs from compensation.

KPIs (Key Performance Indicators)

Better for operational/reliability work where the job is to maintain and improve steady-state metrics.

Use KPIs when:

  • The team owns an ongoing service (SRE, platform, support)
  • Success = keeping metrics in a healthy range, not achieving a one-time outcome
  • You need to track operational health continuously

The hybrid approach (what most mature orgs actually do): OKRs for strategic initiatives (what are we changing this quarter?) + KPIs for operational health (what must we maintain?). Don’t force OKR format on operational work.

Individual goals vs. team goals:

The tricky part for engineering managers. Individual goals feel fair but incentivize local optimization. Team goals encourage collaboration but let underperformers hide.

Recommendation: Team OKRs for what the team delivers + individual growth goals for how each person develops. Amazon does this well: business goals are team-level, but development goals are individual and tracked in 1:1s.

Performance Reviews — Getting Calibration Right

The calibration problem:

Left uncalibrated, manager reviews are normally distributed around “above expectations” — everyone’s above average, which means the scale is meaningless. Calibration is the process of cross-manager alignment so that “exceeds expectations” means the same thing across the org.

How calibration actually works at top companies:

Google: Managers write initial reviews, then calibration committees (skip-level manager + peer managers) review all ratings in a session. Managers must justify ratings with specific examples. Committees redistribute ratings to roughly match the expected distribution. This is uncomfortable but produces fairness.

Netflix: No formal performance ratings. Instead, the “keeper test” — would you fight to keep this person? If not, give them a generous severance. This is radical and works for Netflix’s specific culture (high talent density, high compensation, low job security tolerance). Most companies cannot replicate this because they don’t pay top-of-market to compensate for the insecurity.

Amazon: OLR (Organization-Level Review) is a calibration process where managers rank their org from top to bottom and discuss with peers. Stack ranking by another name, but with nuance — the focus is on “who’s in the wrong role” more than “who’s worst.”

The rating scale debate:

Scale Type Pros Cons Used By
5-point (1-5) Granular, familiar Central tendency bias (everyone’s a 3) Google, Microsoft
3-point (below/meets/exceeds) Forces differentiation Not enough resolution for comp decisions Some startups
No ratings Reduces politics, focus on growth Hard to make comp decisions, recency bias Netflix, Deloitte (tried, partially reverted)
4-point (no middle) Eliminates “meets expectations” default Can feel forced Used in some military-origin systems

My take: 5-point with forced calibration is the least-bad option. You need enough resolution to differentiate for compensation, but you must calibrate or the scale is theater.

The recency bias problem:

Reviews cover 6 or 12 months, but humans remember the last 6 weeks. Countermeasures:

  • Brag documents — ask each person to maintain a running list of accomplishments (Julia Evans popularized this)
  • Manager journal — write brief notes after each 1:1 about impact and growth signals
  • Quarterly check-ins — mini-reviews that create a paper trail for the annual review
  • Peer feedback collected at multiple points — not just at review time

Managing Underperformance

This is where most managers fail — not because they can’t identify underperformance, but because they avoid the conversation until it’s a crisis.

The underperformance spectrum:

Level Signal Response Timeline
Early drift Missing deadlines, lower quality, disengaged in meetings Direct feedback in 1:1, explore root cause 2-4 weeks to see improvement
Consistent underperformance Pattern over 6-8 weeks, feedback not sticking Explicit expectations reset, written plan, weekly check-ins 30-60 days
Formal PIP Written plan failed or issue is severe HR-involved PIP with clear criteria and timeline 30-90 days
Exit PIP failed or pattern is irrecoverable Managed exit with dignity, severance if warranted Immediate-2 weeks

Before the PIP — the conversation most managers skip:

The formal PIP should never be the first time someone hears they’re underperforming. The sequence should be:

  1. Verbal feedback (1:1): “I’ve noticed X pattern. Here’s what I need to see instead. How can I help?”
  2. Written expectations reset (email/doc after 1:1): “Following up on our conversation. Here’s what success looks like in the next 30 days: [specific, measurable criteria]”
  3. Weekly check-ins on progress with explicit acknowledgment of improvement or continued concern
  4. Formal PIP only if steps 1-3 didn’t resolve it

Common causes of underperformance (and the right response):

Root Cause Signal Right Response Wrong Response
Wrong role Strong in some areas, failing in others Explore role change, different team PIP on their weaknesses
Personal crisis Sudden drop from previously strong performer Compassion, temporary load reduction, EAP referral “Your performance is slipping” (without asking why)
Skill gap Willing but unable Training, pairing, mentoring Waiting for them to figure it out
Motivation loss Capable but checked out Explore what’s changed — boredom? conflict? comp? Assuming laziness
Bad fit Cultural mismatch, values conflict Honest conversation about fit, managed exit Trying to “fix” them
Manager failure Unclear expectations, no feedback, no support Fix your management first Blaming the report

The PIP document:

A good PIP is compassionate in intent and ruthless in clarity:

  1. Specific deficiencies — “In the last 60 days, you missed 3 of 5 sprint commitments and delivered code that required significant rework on PRs #142, #156, and #171” (not “your performance is below expectations”)
  2. Clear success criteria — “Over the next 30 days, you will: (a) complete assigned sprint items at 80%+ rate, (b) have no more than 1 PR require rework for quality issues, (c) proactively communicate blockers within 24 hours”
  3. Support provided — “I will: pair you with [senior engineer] for daily 30-min pairing sessions, review your PRs within 4 hours, meet weekly to discuss progress”
  4. Consequences — “If these criteria are not met by [date], we will proceed with separation”
  5. Timeline — 30 days for performance, 60 days for behavioral issues, 90 days only if the person is showing genuine improvement and needs more time

The ethical dimension:

A PIP should be a genuine attempt to help someone succeed, not a paper trail for termination. If you’ve already decided to fire someone, don’t waste their time with a fake PIP — have the exit conversation directly. Using a PIP as legal cover while having no intention of keeping them is dishonest and they always know.

Stack Ranking — The Debate

Stack ranking (forced distribution of performance ratings) was popularized by Jack Welch at GE (“rank and yank” — bottom 10% managed out annually). Microsoft famously used it for decades before abandoning it in 2013.

Why it fails at scale:

  • Destroys collaboration — if my success requires your failure, why would I help you?
  • Punishes great teams — a team of 10 strong performers must still label 1-2 as “underperformers”
  • Gaming — managers hoard low performers to sacrifice, or hire someone specifically to fill the bottom slot
  • Retention inversion — strong performers who see teammates unfairly labeled leave; weak performers protected by the curve stay

What to do instead:

  • Calibrate without forced distribution — discuss all employees in a group, but don’t require a bell curve
  • Focus on growth trajectory — is this person growing, plateaued, or declining? More useful than a ranking
  • Separate evaluation from development — evaluation is backward-looking (what happened), development is forward-looking (what’s next). Don’t try to do both in one conversation
  • Use relative assessment sparingly — when making promotion decisions, relative comparison is useful. For routine reviews, absolute assessment against level expectations is better

High Performers — The Neglected Risk

Most managers spend 80% of their performance management energy on underperformers and neglect their top performers. This is backwards.

What high performers actually need:

Need What It Looks Like What Happens If You Ignore It
Challenge Stretch assignments, new problem domains They get bored and leave
Recognition Specific, public acknowledgment of impact They feel invisible and leave
Autonomy Trust to make decisions, less oversight They feel micromanaged and leave
Growth path Clear next role, skill development plan They see no future and leave
Compensation At or above market, equity refresh, spot bonuses They get poached and leave
Shielding Protection from organizational noise They burn out on politics and leave

The “quiet high performer” problem:

In a team of 16, you likely have 2-3 people who consistently deliver great work without drama. They don’t ask for recognition, they don’t complain, they just execute. These are your highest-risk retention targets because you’ll take them for granted until they hand in their resignation.

Countermeasure: Proactively schedule career conversations with your top performers every quarter. Don’t wait for them to bring it up. Ask: “What would make you start looking elsewhere?” and “What’s the most exciting thing you could be working on?”

Performance Conversations — The Mechanics

The SBI-I framework for performance feedback:

  • Situation: “In last Tuesday’s design review…”
  • Behavior: “…you interrupted the junior engineer three times when they were presenting their approach…”
  • Impact: “…which made them visibly uncomfortable and less likely to share ideas in the future.”
  • Intent/Inquiry: “I don’t think that was your intent. What was going on for you in that moment?”

The “no surprises” principle:

If someone is surprised by their performance review, you failed as a manager — not them. Every piece of feedback in the formal review should have been discussed in 1:1s already. The review is a summary, not a reveal.

Annual review writing tips:

  • Lead with impact, not activity — “Led the migration of 3 services to K8s, reducing deployment time from 45 min to 8 min” not “Worked on Kubernetes migration”
  • Be specific about growth — “Improved significantly in stakeholder communication, specifically in how they present technical tradeoffs to product — visible in the Q3 roadmap discussion” not “Improved communication skills”
  • Address development areas with growth framing — “Next growth edge is learning to delegate more effectively — currently tends to take on too much personally, which limits their team’s development”
  • Calibrate your language — words like “adequate,” “satisfactory,” and “acceptable” all read as negative, even if you mean them neutrally

References

Books

Research & Articles

Talks

This post is licensed under CC BY 4.0 by the author.