Delivery & Execution
Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows -- and the one that surfaces problems early enough to act on them.
Process exists to reduce coordination cost, not to create ceremony. The best delivery system is the one your team actually follows — and the one that surfaces problems early enough to act on them.
Methodology Comparison
There is no universally correct methodology. The right choice depends on the nature of the work, team maturity, and organizational constraints.
Comparison Matrix
| Dimension | Scrum | Kanban | Shape Up |
|---|---|---|---|
| Cadence | Fixed sprints (1-4 weeks) | Continuous flow | 6-week cycles + 2-week cooldown |
| Batch size | Sprint backlog (committed set) | Single item (one-piece flow) | Bet (shaped pitch) |
| Roles | PO, SM, Dev Team | No prescribed roles | Shapers, builders, betters |
| Planning | Sprint planning ceremony | Continuous replenishment | Betting table (leadership decides) |
| Scope | Fixed time, negotiable scope | No fixed scope | Fixed time, variable scope |
| Estimation | Story points or t-shirt sizes | None (use lead time data) | Appetite (how much time is this worth?) |
| Best for | Teams needing structure, stakeholders needing predictability | Support/ops, continuous delivery, mature teams | Product teams with shaped problems, R&D |
| Failure mode | Ceremony theater, estimation arguments, velocity gaming | Invisible WIP explosion, no sense of urgency | Requires strong shaping skill; bad pitches waste 6 weeks |
When Each Works for a 16-Person Org
Scrum works when:
- Your stakeholders (PMs, business) need regular demo points and predictable delivery
- Team members are mixed seniority and benefit from structured ceremonies
- You have clear product backlogs with well-defined stories
Kanban works when:
- You run production support alongside feature work (common in platform teams)
- The team is senior enough to self-organize without sprint boundaries
- Work items vary wildly in size (from 2-hour bug fixes to 2-week features)
Shape Up works when:
- You have a strong product/shaping function that can define problems well
- Teams are frustrated by sprint churn and want longer focus periods
- You can tolerate 6 weeks between course corrections
Hybrid (most common in practice): Scrum ceremonies with Kanban flow — sprint planning sets priorities, but work flows continuously. WIP limits prevent overload. Retrospectives drive improvement. This is what most mature teams actually do regardless of what they call it.
Sprint Planning That Actually Works
The Failure Mode
Most sprint planning fails because it becomes a negotiation ritual: PO pushes more in, team pushes back, everyone compromises on something nobody is confident about. The sprint starts overloaded and ends with carryover.
What Good Looks Like
-
Pre-planning (async, before the meeting): PO has refined stories with acceptance criteria. Engineers have done technical investigation on anything uncertain. The meeting is for commitment, not discovery.
- Capacity-based planning: Start with actual capacity:
- Days available = (team members) x (sprint days) - (PTO + meetings + on-call)
- Historical throughput = average stories completed in last 3 sprints
- Plan to 70-80% of capacity (leave room for unplanned work)
-
Commitment conversation: For each story: “Do we understand it? Can we finish it this sprint? What could block us?” If any answer is uncertain, the story needs more refinement or should be split.
- Sprint goal (singular): One sentence describing what success looks like. Not a list of stories. “Users can complete checkout with the new payment provider” — not “finish stories 1-7.”
Sprint Length
| Length | Context | Trade-off |
|---|---|---|
| 1 week | Rapid iteration, startup pace | High ceremony overhead (20% of time), hard to fit meaningful work |
| 2 weeks | Most common, good default | Balances feedback frequency with focus time |
| 3 weeks | Complex features, needs research time | Awkward calendar alignment, easy to lose urgency |
| 4 weeks | Regulated/enterprise environments | Feedback loop too slow for most software teams |
Recommendation for a 16-person org: 2-week sprints. Long enough to deliver something meaningful, short enough to course-correct. If your squads work on very different problem types, it is fine for each to run a different sprint length.
Estimation — The Eternal Debate
Story Points vs No Estimates vs T-Shirt Sizes
| Approach | How It Works | Strength | Weakness |
|---|---|---|---|
| Story Points | Relative sizing (1, 2, 3, 5, 8, 13) using Fibonacci | Separates effort from time, enables velocity tracking | Easily gamed, becomes political, arguments over “is this a 3 or 5?” |
| No Estimates (#NoEstimates) | Track throughput (stories/sprint), all stories roughly same size | Eliminates waste of estimation meetings, focuses on slicing | Requires disciplined story slicing, stakeholders may resist |
| T-Shirt Sizes | S/M/L/XL for rough bucketing | Fast, low-conflict, good for roadmap planning | Too coarse for sprint-level planning |
| Appetite (Shape Up) | “How much time is this worth to us?” | Focuses on value, not effort | Requires mature product thinking |
The Pragmatic Position
Estimation is useful for two things:
- Surfacing disagreement — if one engineer says “2” and another says “8,” you have found a knowledge gap or scope ambiguity. This is the real value of planning poker.
- Rough capacity planning — stakeholders need to know “Q3 or Q4?” Story points are fine for this level of granularity.
Estimation is harmful when:
- It becomes a performance metric (“your velocity dropped”)
- Teams spend more time estimating than building
- Estimates are treated as commitments rather than forecasts
For a 16-person org: Use story points or t-shirt sizes for roadmap-level planning. At the sprint level, focus on throughput (number of stories completed per sprint) and ensure stories are sliced to roughly similar size (1-3 days of work each).
Flow Metrics — What Actually Matters
The Four Key Metrics
| Metric | Definition | Why It Matters | Target |
|---|---|---|---|
| Cycle Time | Time from work started to work done | Measures how long things take once you begin | < 3 days for a typical story |
| Throughput | Number of items completed per time period | Measures capacity without the abstraction of story points | Stable or increasing trend |
| WIP (Work in Progress) | Number of items actively being worked on | High WIP = context switching = slower everything | WIP limit = team size - 1 (approximately) |
| Lead Time | Time from request to delivery (includes queue time) | What the customer/stakeholder actually experiences | Depends on your SLA |
Little’s Law
Lead Time = WIP / Throughput
This is not a guideline — it is a mathematical law. If you want to reduce lead time, you have exactly two levers:
- Reduce WIP (start fewer things)
- Increase throughput (finish things faster)
Most teams try to increase throughput by adding people. But adding people increases coordination cost and often reduces throughput in the short term (Brooks’s Law). Reducing WIP is almost always the higher-leverage move.
WIP Limits in Practice
For a squad of 6-7 engineers:
- Development WIP limit: 4-5 items (not one per person — pairing and reviewing should happen)
- Review WIP limit: 2-3 items (reviews should not queue for days)
- Testing WIP limit: 2 items (if QA is a separate phase)
The conversation WIP limits force: When an engineer finishes something and the WIP limit is full, they must help finish existing work (review, test, unblock) before starting something new. This is the behavior change that makes WIP limits valuable.
Managing Dependencies
Dependencies are the number one delivery risk in any multi-team organization. With 16 engineers across 2-3 squads, cross-team dependencies are inevitable.
Dependency Types
| Type | Example | Mitigation |
|---|---|---|
| Technical | Squad A needs an API from Squad B | Define API contract early, build against a mock, integrate late |
| Knowledge | Only one person knows the payment system | Pair programming, documentation, rotate ownership |
| Sequential | Feature X must ship before Feature Y can start | Identify early, plan for it, or redesign to remove the dependency |
| External | Waiting on a vendor API or a legal review | Buffer time, parallel workstreams, escalation path |
Dependency Management Practices
- Dependency board: Visible wall/board showing cross-team dependencies with status. Review weekly.
- Scrum of Scrums (or equivalent): 15-minute weekly sync between squad leads to surface blockers. Keep it focused on blockers, not status.
- API-first design: Define contracts before implementation. OpenAPI specs, shared schema repos. Teams can work in parallel against contracts.
- Spike and prototype: When a dependency is uncertain, invest a day in a spike to derisk it before committing the team.
- Accept some duplication: Sometimes the fastest path is for Squad A to build their own thin version rather than waiting for Squad B’s “proper” solution. Technical debt is sometimes cheaper than delay.
Velocity and Predictability
Why Velocity Is Dangerous
Velocity (story points completed per sprint) is useful as a team-internal planning tool and harmful as a management metric:
- Goodhart’s Law: When velocity becomes a target, it ceases to be a useful measure. Teams inflate estimates to show “improvement.”
- Cross-team comparison is meaningless: Team A’s “5-point story” is not the same as Team B’s. Points are relative within a team, not across teams.
- Velocity measures output, not outcome: A team can have high velocity while building the wrong thing.
Better Alternatives
- Throughput: Stories completed per sprint (requires consistent story sizing)
- Cycle time distribution: P50 and P95 cycle times — shows both typical and worst-case delivery
- Sprint goal completion rate: Did the team achieve its sprint goal? Binary yes/no is more useful than points.
- DORA metrics: Deployment frequency, lead time, change failure rate, MTTR — these measure actual delivery capability
Retrospectives That Drive Change
Why Most Retros Fail
- Same complaints every sprint, no action taken
- “Safe space” that is not actually safe — people self-censor
- Actions are vague (“improve communication”) with no owner or deadline
- Too frequent for the team to see results between retros
Format That Works
- Set the stage (5 min): Safety check. Anonymous 1-5 score on “how safe do you feel being honest?” If average is below 3, address that first.
- Gather data (10 min): What happened? Use timeline, 4Ls (Liked, Learned, Lacked, Longed For), or Start/Stop/Continue.
- Generate insights (15 min): Why did it happen? 5-whys on the top 2-3 items. Go deeper than symptoms.
- Decide what to do (10 min): Maximum 2 action items. Each has: owner, deadline, and definition of done.
- Follow up (next retro): First agenda item is always: did we complete last retro’s actions?
Cadence
- Sprint retros: Every 2 weeks (standard with Scrum)
- Team health checks: Monthly — broader assessment of team dynamics, tooling, processes
- Quarterly retro: Bigger picture — are we building the right things? Is our architecture serving us?
Anti-Patterns
| Anti-Pattern | Symptom | Fix |
|---|---|---|
| Estimation theater | 2-hour planning poker sessions with no better accuracy than t-shirt sizing | Switch to t-shirt sizes or #NoEstimates; use estimation only to surface disagreement |
| Velocity as KPI | Manager reports velocity to leadership; teams inflate estimates | Track throughput and cycle time instead; velocity is a team-internal tool only |
| Carry-over culture | Every sprint ends with 30%+ unfinished work | Reduce sprint scope, improve story slicing, address root cause (usually interrupts or unclear requirements) |
| Process without purpose | Daily standups where people give status reports to the manager | Standup is for the team: “What is blocked? Who needs help?” Manager observes, does not run. |
| No-meeting “Agile” | Team dropped all ceremonies to “move fast” | You need feedback loops. Cut ceremony, keep the function. |
| Dependency denial | “We will figure it out during the sprint” | Map dependencies during planning. If you cannot name the dependency owner and their timeline, you have a risk. |
Real-World Application
Shopify’s Move to Shape Up
Shopify adopted Shape Up after finding that Scrum sprints created too much overhead for their product teams. Key results:
- 6-week cycles gave teams enough time to tackle meaningful problems
- “Cooldown” periods (2 weeks between cycles) let teams address tech debt, experiment, and recover
- The “betting table” forced leadership to make explicit priority decisions rather than overloading backlogs
Basecamp (Origin of Shape Up)
Shape Up emerged from Basecamp’s internal practices. The key insight: appetite over estimates. Instead of asking “how long will this take?” ask “how much time is this worth?” If the answer is “2 weeks max,” the team shapes the solution to fit 2 weeks, cutting scope aggressively.
Google’s Engineering Productivity
Google measures engineering teams not by velocity but by:
- Quarterly OKR completion rate — outcome-focused
- Code review turnaround time — process health
- Developer satisfaction surveys — qualitative team health
Amazon’s Working Backwards
Amazon’s planning process starts with a press release for the finished product (the “PR/FAQ”). This forces clarity on outcomes before anyone estimates effort. The estimation question becomes: “Given this outcome, what is the minimum we need to build?”
References
Schwaber, K. & Sutherland, J. (2020). The Scrum Guide — scrumguides.org Singer, R. (2019). Shape Up: Stop Running in Circles and Ship Work that Matters — basecamp.com/shapeup Anderson, D. (2010). Kanban: Successful Evolutionary Change for Your Technology Business Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate — DORA metrics and delivery performance DeMarco, T. & Lister, T. (2013). Peopleware — Why human factors dominate process Brooks, F. (1975/1995). The Mythical Man-Month — Brooks’s Law on adding people to late projects Allen Holub — “#NoEstimates” talks (YouTube) Spotify Engineering Culture videos (2014) — Squad-based delivery model Little’s Law explanation — kanbanize.com/lean-management/pull/little-s-law Vacanti, D. (2015). Actionable Agile Metrics for Predictability — Flow metrics deep dive