Delivery & Execution

Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows -- and the one that surfaces problems early enough to act on them.

Posted Oct 1, 2025

11 min read

Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows — and the one that surfaces problems early enough to act on them.

Methodology Comparison

There is no universally correct methodology. The right choice depends on the nature of the work, team maturity, and organizational constraints.

Comparison Matrix

Dimension	Scrum	Kanban	Shape Up
Cadence	Fixed sprints (1-4 weeks)	Continuous flow	6-week cycles + 2-week cooldown
Batch size	Sprint backlog (committed set)	Single item (one-piece flow)	Bet (shaped pitch)
Roles	PO, SM, Dev Team	No prescribed roles	Shapers, builders, betters
Planning	Sprint planning ceremony	Continuous replenishment	Betting table (leadership decides)
Scope	Fixed time, negotiable scope	No fixed scope	Fixed time, variable scope
Estimation	Story points or t-shirt sizes	None (use lead time data)	Appetite (how much time is this worth?)
Best for	Teams needing structure, stakeholders needing predictability	Support/ops, continuous delivery, mature teams	Product teams with shaped problems, R&D
Failure mode	Ceremony theater, estimation arguments, velocity gaming	Invisible WIP explosion, no sense of urgency	Requires strong shaping skill; bad pitches waste 6 weeks

When Each Works for a 16-Person Org

Scrum works when:

Your stakeholders (PMs, business) need regular demo points and predictable delivery
Team members are mixed seniority and benefit from structured ceremonies
You have clear product backlogs with well-defined stories

Kanban works when:

You run production support alongside feature work (common in platform teams)
The team is senior enough to self-organize without sprint boundaries
Work items vary wildly in size (from 2-hour bug fixes to 2-week features)

Shape Up works when:

You have a strong product/shaping function that can define problems well
Teams are frustrated by sprint churn and want longer focus periods
You can tolerate 6 weeks between course corrections

Hybrid (most common in practice): Scrum ceremonies with Kanban flow — sprint planning sets priorities, but work flows continuously. WIP limits prevent overload. Retrospectives drive improvement. This is what most mature teams actually do regardless of what they call it.

Sprint Planning That Actually Works

The Failure Mode

Most sprint planning fails because it becomes a negotiation ritual: PO pushes more in, team pushes back, everyone compromises on something nobody is confident about. The sprint starts overloaded and ends with carryover.

What Good Looks Like

Pre-planning (async, before the meeting): PO has refined stories with acceptance criteria. Engineers have done technical investigation on anything uncertain. The meeting is for commitment, not discovery.
Capacity-based planning: Start with actual capacity:
- Days available = (team members) x (sprint days) - (PTO + meetings + on-call)
- Historical throughput = average stories completed in last 3 sprints
- Plan to 70-80% of capacity (leave room for unplanned work)
Commitment conversation: For each story: “Do we understand it? Can we finish it this sprint? What could block us?” If any answer is uncertain, the story needs more refinement or should be split.
Sprint goal (singular): One sentence describing what success looks like. Not a list of stories. “Users can complete checkout with the new payment provider” — not “finish stories 1-7.”

Sprint Length

Length	Context	Trade-off
1 week	Rapid iteration, startup pace	High ceremony overhead (20% of time), hard to fit meaningful work
2 weeks	Most common, good default	Balances feedback frequency with focus time
3 weeks	Complex features, needs research time	Awkward calendar alignment, easy to lose urgency
4 weeks	Regulated/enterprise environments	Feedback loop too slow for most software teams

Recommendation for a 16-person org: 2-week sprints. Long enough to deliver something meaningful, short enough to course-correct. If your squads work on very different problem types, it is fine for each to run a different sprint length.

Estimation — The Eternal Debate

Story Points vs No Estimates vs T-Shirt Sizes

Approach	How It Works	Strength	Weakness
Story Points	Relative sizing (1, 2, 3, 5, 8, 13) using Fibonacci	Separates effort from time, enables velocity tracking	Easily gamed, becomes political, arguments over “is this a 3 or 5?”
No Estimates (#NoEstimates)	Track throughput (stories/sprint), all stories roughly same size	Eliminates waste of estimation meetings, focuses on slicing	Requires disciplined story slicing, stakeholders may resist
T-Shirt Sizes	S/M/L/XL for rough bucketing	Fast, low-conflict, good for roadmap planning	Too coarse for sprint-level planning
Appetite (Shape Up)	“How much time is this worth to us?”	Focuses on value, not effort	Requires mature product thinking

The Pragmatic Position

Estimation is useful for two things:

Surfacing disagreement — if one engineer says “2” and another says “8,” you have found a knowledge gap or scope ambiguity. This is the real value of planning poker.
Rough capacity planning — stakeholders need to know “Q3 or Q4?” Story points are fine for this level of granularity.

Estimation is harmful when:

It becomes a performance metric (“your velocity dropped”)
Teams spend more time estimating than building
Estimates are treated as commitments rather than forecasts

For a 16-person org: Use story points or t-shirt sizes for roadmap-level planning. At the sprint level, focus on throughput (number of stories completed per sprint) and ensure stories are sliced to roughly similar size (1-3 days of work each).

Flow Metrics — What Actually Matters

The Four Key Metrics

Metric	Definition	Why It Matters	Target
Cycle Time	Time from work started to work done	Measures how long things take once you begin	< 3 days for a typical story
Throughput	Number of items completed per time period	Measures capacity without the abstraction of story points	Stable or increasing trend
WIP (Work in Progress)	Number of items actively being worked on	High WIP = context switching = slower everything	WIP limit = team size - 1 (approximately)
Lead Time	Time from request to delivery (includes queue time)	What the customer/stakeholder actually experiences	Depends on your SLA

Little’s Law

Lead Time = WIP / Throughput

This is not a guideline — it is a mathematical law. If you want to reduce lead time, you have exactly two levers:

Reduce WIP (start fewer things)
Increase throughput (finish things faster)

Most teams try to increase throughput by adding people. But adding people increases coordination cost and often reduces throughput in the short term (Brooks’s Law). Reducing WIP is almost always the higher-leverage move.

WIP Limits in Practice

For a squad of 6-7 engineers:

Development WIP limit: 4-5 items (not one per person — pairing and reviewing should happen)
Review WIP limit: 2-3 items (reviews should not queue for days)
Testing WIP limit: 2 items (if QA is a separate phase)

The conversation WIP limits force: When an engineer finishes something and the WIP limit is full, they must help finish existing work (review, test, unblock) before starting something new. This is the behavior change that makes WIP limits valuable.

Managing Dependencies

Dependencies are the number one delivery risk in any multi-team organization. With 16 engineers across 2-3 squads, cross-team dependencies are inevitable.

Dependency Types

Type	Example	Mitigation
Technical	Squad A needs an API from Squad B	Define API contract early, build against a mock, integrate late
Knowledge	Only one person knows the payment system	Pair programming, documentation, rotate ownership
Sequential	Feature X must ship before Feature Y can start	Identify early, plan for it, or redesign to remove the dependency
External	Waiting on a vendor API or a legal review	Buffer time, parallel workstreams, escalation path

Dependency Management Practices

Dependency board: Visible wall/board showing cross-team dependencies with status. Review weekly.
Scrum of Scrums (or equivalent): 15-minute weekly sync between squad leads to surface blockers. Keep it focused on blockers, not status.
API-first design: Define contracts before implementation. OpenAPI specs, shared schema repos. Teams can work in parallel against contracts.
Spike and prototype: When a dependency is uncertain, invest a day in a spike to derisk it before committing the team.
Accept some duplication: Sometimes the fastest path is for Squad A to build their own thin version rather than waiting for Squad B’s “proper” solution. Technical debt is sometimes cheaper than delay.

Velocity and Predictability

Why Velocity Is Dangerous

Velocity (story points completed per sprint) is useful as a team-internal planning tool and harmful as a management metric:

Goodhart’s Law: When velocity becomes a target, it ceases to be a useful measure. Teams inflate estimates to show “improvement.”
Cross-team comparison is meaningless: Team A’s “5-point story” is not the same as Team B’s. Points are relative within a team, not across teams.
Velocity measures output, not outcome: A team can have high velocity while building the wrong thing.

Better Alternatives

Throughput: Stories completed per sprint (requires consistent story sizing)
Cycle time distribution: P50 and P95 cycle times — shows both typical and worst-case delivery
Sprint goal completion rate: Did the team achieve its sprint goal? Binary yes/no is more useful than points.
DORA metrics: Deployment frequency, lead time, change failure rate, MTTR — these measure actual delivery capability

Retrospectives That Drive Change

Why Most Retros Fail

Same complaints every sprint, no action taken
“Safe space” that is not actually safe — people self-censor
Actions are vague (“improve communication”) with no owner or deadline
Too frequent for the team to see results between retros

Format That Works

Set the stage (5 min): Safety check. Anonymous 1-5 score on “how safe do you feel being honest?” If average is below 3, address that first.
Gather data (10 min): What happened? Use timeline, 4Ls (Liked, Learned, Lacked, Longed For), or Start/Stop/Continue.
Generate insights (15 min): Why did it happen? 5-whys on the top 2-3 items. Go deeper than symptoms.
Decide what to do (10 min): Maximum 2 action items. Each has: owner, deadline, and definition of done.
Follow up (next retro): First agenda item is always: did we complete last retro’s actions?

Cadence

Sprint retros: Every 2 weeks (standard with Scrum)
Team health checks: Monthly — broader assessment of team dynamics, tooling, processes
Quarterly retro: Bigger picture — are we building the right things? Is our architecture serving us?

Anti-Patterns

Anti-Pattern	Symptom	Fix
Estimation theater	2-hour planning poker sessions with no better accuracy than t-shirt sizing	Switch to t-shirt sizes or #NoEstimates; use estimation only to surface disagreement
Velocity as KPI	Manager reports velocity to leadership; teams inflate estimates	Track throughput and cycle time instead; velocity is a team-internal tool only
Carry-over culture	Every sprint ends with 30%+ unfinished work	Reduce sprint scope, improve story slicing, address root cause (usually interrupts or unclear requirements)
Process without purpose	Daily standups where people give status reports to the manager	Standup is for the team: “What is blocked? Who needs help?” Manager observes, does not run.
No-meeting “Agile”	Team dropped all ceremonies to “move fast”	You need feedback loops. Cut ceremony, keep the function.
Dependency denial	“We will figure it out during the sprint”	Map dependencies during planning. If you cannot name the dependency owner and their timeline, you have a risk.

Real-World Application

Shopify’s Move to Shape Up

Shopify adopted Shape Up after finding that Scrum sprints created too much overhead for their product teams. Key results:

6-week cycles gave teams enough time to tackle meaningful problems
“Cooldown” periods (2 weeks between cycles) let teams address tech debt, experiment, and recover
The “betting table” forced leadership to make explicit priority decisions rather than overloading backlogs

Basecamp (Origin of Shape Up)

Shape Up emerged from Basecamp’s internal practices. The key insight: appetite over estimates. Instead of asking “how long will this take?” ask “how much time is this worth?” If the answer is “2 weeks max,” the team shapes the solution to fit 2 weeks, cutting scope aggressively.

Google’s Engineering Productivity

Google measures engineering teams not by velocity but by:

Quarterly OKR completion rate — outcome-focused
Code review turnaround time — process health
Developer satisfaction surveys — qualitative team health

Amazon’s Working Backwards

Amazon’s planning process starts with a press release for the finished product (the “PR/FAQ”). This forces clarity on outcomes before anyone estimates effort. The estimation question becomes: “Given this outcome, what is the minimum we need to build?”

References

Schwaber, K. & Sutherland, J. (2020). The Scrum Guide — scrumguides.org Singer, R. (2019). Shape Up: Stop Running in Circles and Ship Work that Matters — basecamp.com/shapeup Anderson, D. (2010). Kanban: Successful Evolutionary Change for Your Technology Business Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate — DORA metrics and delivery performance DeMarco, T. & Lister, T. (2013). Peopleware — Why human factors dominate process Brooks, F. (1975/1995). The Mythical Man-Month — Brooks’s Law on adding people to late projects Allen Holub — “#NoEstimates” talks (YouTube) Spotify Engineering Culture videos (2014) — Squad-based delivery model Little’s Law explanation — kanbanize.com/lean-management/pull/little-s-law Vacanti, D. (2015). Actionable Agile Metrics for Predictability — Flow metrics deep dive

Engineering Leadership

delivery execution scrum kanban shape-up estimation velocity

This post is licensed under CC BY 4.0 by the author.