Story Points Are Theater: A Saner Way to Estimate When You're Small
Story points solved a real problem for big teams, but for solo devs and tiny crews they turn into poker games and velocity charts that produce false precision and zero shipped code. Here's a lighter system.
You spent forty-five minutes arguing whether a feature was a 3 or a 5, and at the end of it you had shipped exactly nothing. If you've ever sat in a planning poker session — or worse, run one for a team of two — you know the feeling. The cards come out, the numbers get debated, someone invokes the Fibonacci sequence with the gravity of a physicist, and then everyone goes back to their desks no more certain than before.
Estimation is not the enemy. Pretending you can measure uncertainty to two significant figures is. For a solo developer or a team of three, story points are usually theater: an elaborate ritual borrowed from a context that no longer applies to you.
A short, honest history of story points
Story points were invented to solve a specific organizational problem. In the early days of Extreme Programming and Scrum, teams wanted to plan how much work fit into an iteration without committing to hours. Hours were dangerous: managers treated them as promises, and "8 hours" became a stick to beat developers with when the work took 12.
So the community reached for abstraction. Instead of "this will take three days," you'd say "this is a 3" — a relative measure of size, complexity, and uncertainty rolled into one dimensionless number. Planning poker, popularized by Mike Cohn and others, added a clever wrinkle: everyone reveals their estimate simultaneously so nobody anchors on the loudest voice in the room. When the senior engineer says "2" and the junior says "8," that gap is a conversation starter — it surfaces hidden assumptions before code gets written.
That's the genuinely good part. The discussion is the value. The number is almost incidental.
The trouble is that points only work because of statistics you can't see. Over many tasks and many iterations, individual estimation errors cancel out. A team of eight makes dozens of guesses per sprint; some are high, some are low, and the law of large numbers smooths them into a stable velocity. That smoothing is the entire engine. Take it away and the machine seizes.
Why points break down when you're small
Here's the uncomfortable math: a solo dev does not have a team of eight. You have a team of one. The statistical smoothing that makes velocity meaningful needs volume and diversity of estimators. You have neither.
You are the team — so there's no anchor to break
Planning poker's superpower is surfacing disagreement between people. When you're estimating alone, there is no second card to reveal. You're just... guessing, and then writing the guess down in a more elaborate format. The "wisdom of the crowd" requires a crowd. A crowd of one is a person talking to themselves.
Small samples don't smooth — they swing
With five or six tasks in a cycle instead of fifty, one nasty surprise doesn't average out. It dominates. Your "velocity" of 18 points last week and 7 points this week isn't signal; it's noise wearing a lab coat. You can't plan capacity off a number that swings 60 percent week to week, and you'll waste real hours trying to explain the variance in a retro that didn't need to happen.
You will game your own points
This is the quiet killer. Once you start tracking velocity, you start — consciously or not — estimating to hit the number. Tasks mysteriously inflate so the burndown looks productive. A two-minute config change becomes a "1, but really a 2 because context-switching." You are both the worker and the auditor, and the auditor is easily bribed. Goodhart's Law in miniature: the moment a measure becomes a target, it stops measuring anything.
Points let you estimate without committing to hours. For a team, that's protection. For a solo dev, it's just hours wearing a disguise — and you still have to ship by Friday.
The false comfort of velocity charts
Velocity charts feel like control. A clean upward line, points completed per cycle, a trend you can point a stakeholder at. The problem is that for a tiny team the chart measures the wrong thing, and measures it badly.
A velocity chart tells you how many points you closed. It says nothing about whether those points mattered, whether the work moved the product forward, or whether you spent the week polishing a settings screen nobody asked for. You can have a beautiful, rising velocity and a dying product. I've written before about why you should forget velocity and track cycle time instead — because cycle time, the wall-clock time from "started" to "done," is a real number you can't fudge. It correlates with the thing you actually care about: how fast ideas become shipped reality.
~23 min
To refocus after a context switch
Gloria Mark's research
45 min
Typical planning-poker session for a feature
time not spent shipping
1 of 1
Estimators in a solo poker game
the crowd is just you
Velocity is a vanity metric in disguise. It optimizes for the appearance of throughput. And every minute spent grooming, scoring, and re-scoring the backlog is a minute not spent in the editor — which, given that it takes roughly 23 minutes to fully refocus after a context switch, is a worse trade than it looks.
A saner alternative: three buckets
You don't need to abandon estimation. You need to right-size it. Replace the Fibonacci cards with three plain-language buckets that map to time horizons you actually feel:
- Small = today. A few minutes to a few hours. You can knock it out before lunch without rearranging your day. Fix the copy, add the index, wire up the button.
- Medium = this week. A chunk of focused work that spans a day or two but is still one coherent thing you can hold in your head. Build the export endpoint, add the calendar view.
- Large = break it down first. Anything that won't fit comfortably in a week, or that you can't picture finishing because the shape is still fuzzy.
That's the whole system. Three words. No reference task, no fibonacci, no poker. You can assign a bucket in two seconds because you're not pretending to be precise — you're sorting work into "now," "soon," and "not yet ready."
This is essentially T-shirt sizing stripped to its core, and it works because it's honest about its own resolution. Nobody argues whether a shirt is a 5 or an 8. It's M or it's L, and that distinction is enough to plan around.
If it's a Large, your real job is to split it
This is the most important rule, so it gets its own heading. When a task lands in the Large bucket, the correct response is not to write down a bigger number. It's to admit you don't understand the work well enough to start, and to break it down until you do.
A Large is a signal, not an estimate. It's telling you the task is really an epic in disguise — several deliverables wearing one title. Splitting it into subtasks does three things at once: it forces you to think through the actual steps (which is where the real estimation happens), it gives you shippable increments, and it surfaces the hidden complexity that would have blown your guess anyway.
Keep the splitting shallow, though. One level deep is almost always enough — a parent task with a flat list of children. I've made the case for subtasks one level deep elsewhere; nested trees of sub-sub-subtasks are just story points with extra steps. If a child is still a Large, that's your cue to promote it to a top-level task and split again, not to spawn a fourth tier of nesting.
Story points vs. bucket sizing, head to head
Story Points + Planning Poker
What works
●The discussion that surfaces hidden assumptions is genuinely valuable
●Works well for larger teams with statistical smoothing across many tasks
●Decouples estimates from hours, protecting devs from hour-as-promise pressure
What doesn't
●Requires a crowd to break anchoring — useless for a team of one
●Small samples swing wildly, so velocity becomes noise
●Easy to game your own points, especially when you're the only auditor
●The ritual eats hours you could spend shipping
Three-Bucket Sizing (Small / Medium / Large)
What works
●Takes two seconds per task — plain language, no Fibonacci
●Honest about its own low resolution, so nobody over-debates
●A 'Large' triggers the right action: split, don't score
●Pairs naturally with WIP limits and continuous flow
What doesn't
●Won't produce a tidy velocity chart for stakeholders who want one
●Less precise for long-range capacity forecasting across big teams
●Requires discipline to actually break down the Larges
The comparison isn't really "which is better." It's "which fits your scale." Points are a heavyweight protocol for coordinating many estimators. Buckets are a lightweight heuristic for one or two people who already know roughly what they're getting into. Use the tool that matches the size of the problem.
How well each method fits a solo dev or team of 2–3
How this connects to WIP limits and prioritization
Bucket sizing isn't a standalone trick — it's the front door to a leaner workflow.
Sizing tells you what to pull next
Once tasks are sorted into Smalls and Mediums, your work-in-progress limit becomes obvious. Pull one Medium, or batch two or three Smalls, into your active column — and nothing else until it's done. This is just Little's Law applied with your gut instead of a spreadsheet: the less you have in flight, the faster each thing finishes. You don't need points to cap WIP. You need an honest sense of "this is one chunk," which buckets give you for free. If you've been running fixed iterations, this is also the gentle on-ramp to the argument that you don't need sprints when you have continuous flow.
Priority and size are different questions
A common mistake is letting size leak into priority. A Small bug that's breaking checkout outranks a Medium feature that's merely nice. Keep the two axes separate: priority answers should I do this, size answers how do I attack it. This is exactly why a clean P1–P4 priority system beats a single conflated number — and why I lean on plain priority levels instead of trying to bake urgency into an estimate. Estimate the effort, prioritize the value, and let the two decisions stay honest.
The board does the bookkeeping
In practice you don't even need a separate "estimate" field. A label or a quick tag on your kanban board is plenty. In GritShip the whole loop is a couple of keystrokes — tag a task Small or Medium, drag it into your active column, ship it, repeat — and because every interaction lands in under 200ms, the estimating never becomes the bottleneck the way a planning-poker tool does. The tool should get out of the way so the work can happen.
What you actually lose (and what you don't)
Let's be fair about the tradeoff. If you drop story points, you lose the ability to hand a manager a velocity trend line. For a solo dev or a tiny team, that's a loss of theater, not capability — there's no manager asking, and the line was lying anyway.
What you keep is everything that mattered:
- You still estimate. Three buckets is an estimate. It's just calibrated to the precision you can actually achieve.
- You still plan. "Two Mediums and a handful of Smalls this week" is a plan. A better one, because it maps to real days.
- You still surface complexity. The Large-equals-split rule does the same job planning poker's disagreement did — it forces you to confront the work before you commit to it.
What you drop is the false precision, the ritual overhead, and the seductive vanity chart. That's a good trade. For more on building a whole system around this kind of restraint, the solo dev PM guide without bloat walks through the rest of the stack.
Frequently asked questions
- Are story points always a bad idea?
- No. For larger teams — say eight or more engineers running many tasks per iteration — story points and planning poker work because individual estimation errors cancel out across volume, producing a stable velocity. The argument here is narrow: at solo or tiny-team scale, that statistical smoothing doesn't exist, so points become ritual without payoff.
- What's the difference between story points and the three-bucket system?
- Story points are a relative numeric scale (often Fibonacci) meant to be averaged across many tasks and estimators. The three-bucket system — Small means today, Medium means this week, Large means break it down — is plain-language T-shirt sizing tuned for one or two people. It's faster, harder to game, and triggers the right action when work is too big.
- How do I estimate a task that feels like a 'Large'?
- You don't estimate it — you split it. A Large is a signal that the task is really an epic in disguise. Break it into Smalls and Mediums, one level of subtasks deep, until each piece is shippable on its own. The act of splitting is where the real estimation happens, and it surfaces hidden complexity that would have wrecked any single guess.
- If I drop velocity, how do I know if I'm being productive?
- Track cycle time — the wall-clock time from when you start a task to when it ships. Unlike velocity, cycle time can't be gamed by inflating point values, and it directly reflects how fast ideas become shipped reality. A short, stable cycle time is the metric that actually correlates with momentum.
- Does dropping story points mean dropping agile?
- Not at all. Agile is about shipping small increments and adapting fast. Story points are one optional implementation detail, not the philosophy. Three buckets, tight work-in-progress limits, and continuous flow are arguably more agile than a velocity-driven sprint ritual, because they keep you focused on shipping rather than forecasting.
- How long should estimating a task take?
- About two seconds. If you're spending real time debating a bucket, that's a sign the task is poorly defined — fix the task, not the estimate. The whole point of lightweight sizing is that the overhead disappears and you spend your minutes building instead of scoring.
Estimation is a tool for thinking, not a performance for an audience. When you're small, the audience doesn't exist — it's just you, your board, and the work. Drop the cards, sort your tasks into three honest buckets, split anything too big to picture, and spend the time you save shipping. The product won't remember how many points it took. It'll only remember that you finished.
Tired of bloated PM tools?
GritShip is project management for developers who'd rather ship than configure.
Try GritShip free →