Post

Delivery & Execution

Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows -- and the one that surfaces problems early enough to act on them.

Delivery & Execution

Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows — and the one that surfaces problems early enough to act on them.


Methodology Comparison

There is no universally correct methodology. The right choice depends on the nature of the work, team maturity, and organizational constraints.

Comparison Matrix

Dimension Scrum Kanban Shape Up
Cadence Fixed sprints (1-4 weeks) Continuous flow 6-week cycles + 2-week cooldown
Batch size Sprint backlog (committed set) Single item (one-piece flow) Bet (shaped pitch)
Roles PO, SM, Dev Team No prescribed roles Shapers, builders, betters
Planning Sprint planning ceremony Continuous replenishment Betting table (leadership decides)
Scope Fixed time, negotiable scope No fixed scope Fixed time, variable scope
Estimation Story points or t-shirt sizes None (use lead time data) Appetite (how much time is this worth?)
Best for Teams needing structure, stakeholders needing predictability Support/ops, continuous delivery, mature teams Product teams with shaped problems, R&D
Failure mode Ceremony theater, estimation arguments, velocity gaming Invisible WIP explosion, no sense of urgency Requires strong shaping skill; bad pitches waste 6 weeks

When Each Works for a 16-Person Org

Scrum works when:

  • Your stakeholders (PMs, business) need regular demo points and predictable delivery
  • Team members are mixed seniority and benefit from structured ceremonies
  • You have clear product backlogs with well-defined stories

Kanban works when:

  • You run production support alongside feature work (common in platform teams)
  • The team is senior enough to self-organize without sprint boundaries
  • Work items vary wildly in size (from 2-hour bug fixes to 2-week features)

Shape Up works when:

  • You have a strong product/shaping function that can define problems well
  • Teams are frustrated by sprint churn and want longer focus periods
  • You can tolerate 6 weeks between course corrections

Hybrid (most common in practice): Scrum ceremonies with Kanban flow — sprint planning sets priorities, but work flows continuously. WIP limits prevent overload. Retrospectives drive improvement. This is what most mature teams actually do regardless of what they call it.


Sprint Planning That Actually Works

The Failure Mode

Most sprint planning fails because it becomes a negotiation ritual: PO pushes more in, team pushes back, everyone compromises on something nobody is confident about. The sprint starts overloaded and ends with carryover.

What Good Looks Like

  1. Pre-planning (async, before the meeting): PO has refined stories with acceptance criteria. Engineers have done technical investigation on anything uncertain. The meeting is for commitment, not discovery.

  2. Capacity-based planning: Start with actual capacity:
    • Days available = (team members) x (sprint days) - (PTO + meetings + on-call)
    • Historical throughput = average stories completed in last 3 sprints
    • Plan to 70-80% of capacity (leave room for unplanned work)
  3. Commitment conversation: For each story: “Do we understand it? Can we finish it this sprint? What could block us?” If any answer is uncertain, the story needs more refinement or should be split.

  4. Sprint goal (singular): One sentence describing what success looks like. Not a list of stories. “Users can complete checkout with the new payment provider” — not “finish stories 1-7.”

Sprint Length

Length Context Trade-off
1 week Rapid iteration, startup pace High ceremony overhead (20% of time), hard to fit meaningful work
2 weeks Most common, good default Balances feedback frequency with focus time
3 weeks Complex features, needs research time Awkward calendar alignment, easy to lose urgency
4 weeks Regulated/enterprise environments Feedback loop too slow for most software teams

Recommendation for a 16-person org: 2-week sprints. Long enough to deliver something meaningful, short enough to course-correct. If your squads work on very different problem types, it is fine for each to run a different sprint length.


Estimation — The Eternal Debate

Story Points vs No Estimates vs T-Shirt Sizes

Approach How It Works Strength Weakness
Story Points Relative sizing (1, 2, 3, 5, 8, 13) using Fibonacci Separates effort from time, enables velocity tracking Easily gamed, becomes political, arguments over “is this a 3 or 5?”
No Estimates (#NoEstimates) Track throughput (stories/sprint), all stories roughly same size Eliminates waste of estimation meetings, focuses on slicing Requires disciplined story slicing, stakeholders may resist
T-Shirt Sizes S/M/L/XL for rough bucketing Fast, low-conflict, good for roadmap planning Too coarse for sprint-level planning
Appetite (Shape Up) “How much time is this worth to us?” Focuses on value, not effort Requires mature product thinking

The Pragmatic Position

Estimation is useful for two things:

  1. Surfacing disagreement — if one engineer says “2” and another says “8,” you have found a knowledge gap or scope ambiguity. This is the real value of planning poker.
  2. Rough capacity planning — stakeholders need to know “Q3 or Q4?” Story points are fine for this level of granularity.

Estimation is harmful when:

  • It becomes a performance metric (“your velocity dropped”)
  • Teams spend more time estimating than building
  • Estimates are treated as commitments rather than forecasts

For a 16-person org: Use story points or t-shirt sizes for roadmap-level planning. At the sprint level, focus on throughput (number of stories completed per sprint) and ensure stories are sliced to roughly similar size (1-3 days of work each).


Flow Metrics — What Actually Matters

The Four Key Metrics

Metric Definition Why It Matters Target
Cycle Time Time from work started to work done Measures how long things take once you begin < 3 days for a typical story
Throughput Number of items completed per time period Measures capacity without the abstraction of story points Stable or increasing trend
WIP (Work in Progress) Number of items actively being worked on High WIP = context switching = slower everything WIP limit = team size - 1 (approximately)
Lead Time Time from request to delivery (includes queue time) What the customer/stakeholder actually experiences Depends on your SLA

Little’s Law

Lead Time = WIP / Throughput

This is not a guideline — it is a mathematical law. If you want to reduce lead time, you have exactly two levers:

  1. Reduce WIP (start fewer things)
  2. Increase throughput (finish things faster)

Most teams try to increase throughput by adding people. But adding people increases coordination cost and often reduces throughput in the short term (Brooks’s Law). Reducing WIP is almost always the higher-leverage move.

WIP Limits in Practice

For a squad of 6-7 engineers:

  • Development WIP limit: 4-5 items (not one per person — pairing and reviewing should happen)
  • Review WIP limit: 2-3 items (reviews should not queue for days)
  • Testing WIP limit: 2 items (if QA is a separate phase)

The conversation WIP limits force: When an engineer finishes something and the WIP limit is full, they must help finish existing work (review, test, unblock) before starting something new. This is the behavior change that makes WIP limits valuable.


Managing Dependencies

Dependencies are the number one delivery risk in any multi-team organization. With 16 engineers across 2-3 squads, cross-team dependencies are inevitable.

Dependency Types

Type Example Mitigation
Technical Squad A needs an API from Squad B Define API contract early, build against a mock, integrate late
Knowledge Only one person knows the payment system Pair programming, documentation, rotate ownership
Sequential Feature X must ship before Feature Y can start Identify early, plan for it, or redesign to remove the dependency
External Waiting on a vendor API or a legal review Buffer time, parallel workstreams, escalation path

Dependency Management Practices

  1. Dependency board: Visible wall/board showing cross-team dependencies with status. Review weekly.
  2. Scrum of Scrums (or equivalent): 15-minute weekly sync between squad leads to surface blockers. Keep it focused on blockers, not status.
  3. API-first design: Define contracts before implementation. OpenAPI specs, shared schema repos. Teams can work in parallel against contracts.
  4. Spike and prototype: When a dependency is uncertain, invest a day in a spike to derisk it before committing the team.
  5. Accept some duplication: Sometimes the fastest path is for Squad A to build their own thin version rather than waiting for Squad B’s “proper” solution. Technical debt is sometimes cheaper than delay.

Velocity and Predictability

Why Velocity Is Dangerous

Velocity (story points completed per sprint) is useful as a team-internal planning tool and harmful as a management metric:

  • Goodhart’s Law: When velocity becomes a target, it ceases to be a useful measure. Teams inflate estimates to show “improvement.”
  • Cross-team comparison is meaningless: Team A’s “5-point story” is not the same as Team B’s. Points are relative within a team, not across teams.
  • Velocity measures output, not outcome: A team can have high velocity while building the wrong thing.

Better Alternatives

  • Throughput: Stories completed per sprint (requires consistent story sizing)
  • Cycle time distribution: P50 and P95 cycle times — shows both typical and worst-case delivery
  • Sprint goal completion rate: Did the team achieve its sprint goal? Binary yes/no is more useful than points.
  • DORA metrics: Deployment frequency, lead time, change failure rate, MTTR — these measure actual delivery capability

Retrospectives That Drive Change

Why Most Retros Fail

  • Same complaints every sprint, no action taken
  • “Safe space” that is not actually safe — people self-censor
  • Actions are vague (“improve communication”) with no owner or deadline
  • Too frequent for the team to see results between retros

Format That Works

  1. Set the stage (5 min): Safety check. Anonymous 1-5 score on “how safe do you feel being honest?” If average is below 3, address that first.
  2. Gather data (10 min): What happened? Use timeline, 4Ls (Liked, Learned, Lacked, Longed For), or Start/Stop/Continue.
  3. Generate insights (15 min): Why did it happen? 5-whys on the top 2-3 items. Go deeper than symptoms.
  4. Decide what to do (10 min): Maximum 2 action items. Each has: owner, deadline, and definition of done.
  5. Follow up (next retro): First agenda item is always: did we complete last retro’s actions?

Cadence

  • Sprint retros: Every 2 weeks (standard with Scrum)
  • Team health checks: Monthly — broader assessment of team dynamics, tooling, processes
  • Quarterly retro: Bigger picture — are we building the right things? Is our architecture serving us?

Anti-Patterns

Anti-Pattern Symptom Fix
Estimation theater 2-hour planning poker sessions with no better accuracy than t-shirt sizing Switch to t-shirt sizes or #NoEstimates; use estimation only to surface disagreement
Velocity as KPI Manager reports velocity to leadership; teams inflate estimates Track throughput and cycle time instead; velocity is a team-internal tool only
Carry-over culture Every sprint ends with 30%+ unfinished work Reduce sprint scope, improve story slicing, address root cause (usually interrupts or unclear requirements)
Process without purpose Daily standups where people give status reports to the manager Standup is for the team: “What is blocked? Who needs help?” Manager observes, does not run.
No-meeting “Agile” Team dropped all ceremonies to “move fast” You need feedback loops. Cut ceremony, keep the function.
Dependency denial “We will figure it out during the sprint” Map dependencies during planning. If you cannot name the dependency owner and their timeline, you have a risk.

Real-World Application

Shopify’s Move to Shape Up

Shopify adopted Shape Up after finding that Scrum sprints created too much overhead for their product teams. Key results:

  • 6-week cycles gave teams enough time to tackle meaningful problems
  • “Cooldown” periods (2 weeks between cycles) let teams address tech debt, experiment, and recover
  • The “betting table” forced leadership to make explicit priority decisions rather than overloading backlogs

Basecamp (Origin of Shape Up)

Shape Up emerged from Basecamp’s internal practices. The key insight: appetite over estimates. Instead of asking “how long will this take?” ask “how much time is this worth?” If the answer is “2 weeks max,” the team shapes the solution to fit 2 weeks, cutting scope aggressively.

Google’s Engineering Productivity

Google measures engineering teams not by velocity but by:

  • Quarterly OKR completion rate — outcome-focused
  • Code review turnaround time — process health
  • Developer satisfaction surveys — qualitative team health

Amazon’s Working Backwards

Amazon’s planning process starts with a press release for the finished product (the “PR/FAQ”). This forces clarity on outcomes before anyone estimates effort. The estimation question becomes: “Given this outcome, what is the minimum we need to build?”


References

Schwaber, K. & Sutherland, J. (2020). The Scrum Guidescrumguides.org Singer, R. (2019). Shape Up: Stop Running in Circles and Ship Work that Mattersbasecamp.com/shapeup Anderson, D. (2010). Kanban: Successful Evolutionary Change for Your Technology Business Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate — DORA metrics and delivery performance DeMarco, T. & Lister, T. (2013). Peopleware — Why human factors dominate process Brooks, F. (1975/1995). The Mythical Man-Month — Brooks’s Law on adding people to late projects Allen Holub — “#NoEstimates” talks (YouTube) Spotify Engineering Culture videos (2014) — Squad-based delivery model Little’s Law explanation — kanbanize.com/lean-management/pull/little-s-law Vacanti, D. (2015). Actionable Agile Metrics for Predictability — Flow metrics deep dive

This post is licensed under CC BY 4.0 by the author.