Why Velocity Averaging Fails (And What to Use Instead)

Velocity is the most popular forecasting metric in agile delivery.

It's also the least accurate one when you average it and treat the average as a forecast.

That's not a contradiction. It's a reflection of what velocity actually measures — and what averaging does to that measurement.

What velocity is (and isn't)

Velocity is a measure of throughput — how many story points (or items) a team completed in a sprint. As a measure, it's fine. It's a simple integer over a fixed time window.

Velocity is not a measure of:

How fast the team can work (capacity)
How predictable the team is (consistency)
How risky the upcoming work is (variance)
What stakeholders should commit to (calibrated confidence)

The problem starts when teams take a velocity number and project forward by averaging the last N sprints.

The math problem with averages

Suppose your team's last 8 sprints had these velocities (in story points):

Sprint	Velocity
S-8	22
S-7	18
S-6	14
S-5	30
S-4	16
S-3	10
S-2	26
S-1	20

Average: 19.5 points/sprint.

Forecasting a 100-point release: 100 ÷ 19.5 = 5.13 sprints → call it 6 sprints.

Here's the question stakeholders should be asking: what's the probability this is right?

Answer: about 50%. Half the futures finish faster than 5.13 sprints. Half finish slower.

"Average" means "the point at which half the outcomes are worse than this." That's a coin flip — not a commitment.

Variance is the actual signal

Look again at the velocities. The min was 10 (S-3), the max was 30 (S-5). Standard deviation: about 6.6.

That spread tells you something the average can't: this team is inconsistent. Its 100-point release isn't going to take exactly 5.13 sprints. It might take 4 sprints (if recent velocities hold). It might take 9 sprints (if the team hits another low patch).

Most velocity-averaging conversations skip this. The team reports "19.5 points/sprint" and management writes that down. Three months later, when the release is two sprints late, everyone wonders what went wrong.

Nothing went wrong. The forecast was a 50% confidence number. It was wrong about half the time.

What to do instead

Two options, in increasing order of sophistication:

Option 1 — Report a range, not an average

Instead of "19.5 points/sprint," report:

"Last 8 sprints: 10 to 30 points/sprint, median 19. A 100-point release will likely complete in 4–10 sprints depending on which sprints we get. Best plan: commit to 7 sprints with a 1-sprint buffer."

This is more honest and more actionable. It also forces the conversation about why the variance exists — which usually surfaces real issues (interruption load, cross-team dependencies, scope churn).

Option 2 — Run a Monte Carlo simulation

Treat the historical velocity as a sample distribution. Run thousands of simulations: each iteration randomly draws 5 sprints from history and asks "did the 100-point release finish?" The distribution of outcomes gives you calibrated confidence levels.

Output looks like:

Confidence	Sprints to deliver 100 pts
P50 (best case)	5 sprints
P85 (target)	7 sprints
P95 (conservative)	9 sprints

This is what we cover in detail in Monte Carlo Forecasting for Azure DevOps.

The "but my team is consistent" objection

Sometimes you hear: "Our team's velocity is steady. Last 5 sprints were all 20 ± 1 points. Average works fine."

If that's actually true, then yes — average is fine, because variance is low. But verify it. Pull the last 13 sprints (not 5) and check standard deviation. Most "steady" teams have a coefficient of variation of 25–35%, which is not steady.

And even if velocity is steady, you still have:

Future scope risk — upcoming work isn't identical to past work
Team composition risk — vacation, attrition, role shifts
Process risk — sudden cross-team dependencies

A range communicates these implicitly. An average suppresses them.

What about story points themselves?

This article focused on the math problem with averaging velocity. There's a separate (longer) debate about whether story points are useful at all. Short answer: story points are a relative-sizing tool that emerges from team conversation. They're useful for refinement and roughly sizing scope. They're not a precision measurement, and treating them as one (especially across teams) is its own anti-pattern.

For Monte Carlo specifically, both story points and item counts work. We see slightly better results with item counts in teams with stable story sizing, and slightly better results with story points in teams with high heterogeneity in work sizes.

Replacing velocity averaging in your team

Three steps that take an hour:

Pull the last 13 sprints of velocity from Azure DevOps Boards (Analytics view or via API).
Compute min/median/max + standard deviation — this gives you the variance signal.
Run a Monte Carlo simulation on the data. This is where tools like Nexus Hub Pro earn their keep — the simulation runs in seconds, the output is a calibrated forecast, and stakeholders see a range instead of an average.

Or stay with averages and explain to stakeholders, every quarter, why the forecast was off again.

Replace velocity averaging with calibrated forecasting

Nexus Hub Pro runs Monte Carlo forecasts on your real Azure DevOps throughput. 14-day free Pro trial — no credit card.

Install from Marketplace →