Every engineering manager has been in the room when a stakeholder asks, "When will it be done?" — and felt the gravity of giving an honest answer. Velocity-averaged estimates feel comfortable, but they're wrong often enough that "8 sprints" becomes a running joke.
Monte Carlo forecasting replaces gut feel with probability. It's been used in physics, finance, and project management for decades. Inside Azure DevOps, it gives you something simple but powerful: a probability distribution over delivery dates — not one number, but a range with calibrated confidence.
This guide explains what Monte Carlo forecasting is, how it works on Azure DevOps throughput data, and how to communicate the output to stakeholders who don't care about statistics.
What Monte Carlo forecasting actually does
The premise is straightforward. Take a team's actual throughput history (items closed per week for the last N weeks). Treat that history as a sample of the team's real behavior. Then simulate the future: thousands of times, draw a random sample from the historical distribution, and ask "if next week looks like one of those past weeks, when does the backlog finish?"
Run the simulation 10,000 times and you get a distribution of possible delivery dates — not a point estimate, but a full picture of what's likely.
"Eight sprints" is a guess. "There's an 85% chance we'll finish by July 10" is a forecast.
Worked example: a 50-item backlog
Suppose your team's throughput over the last 13 weeks looked like this:
| Week | Items closed |
|---|---|
| W-13 | 5 |
| W-12 | 6 |
| W-11 | 4 |
| W-10 | 7 |
| W-9 | 3 |
| W-8 | 8 |
| W-7 | 5 |
| W-6 | 6 |
| W-5 | 2 |
| W-4 | 9 |
| W-3 | 5 |
| W-2 | 7 |
| W-1 | 4 |
Average velocity: 5.46 items/week. Linear estimate for 50 items: 50 ÷ 5.46 = 9.16 weeks. Round to 10 weeks. Done.
Except that's wrong in two important ways.
First, it ignores variability. The team had a 2-item week and a 9-item week. The reality is messy. Linear estimates pretend it isn't.
Second, it's a 50% confidence number. Half the futures are slower. Stakeholders who plan against the average will be disappointed half the time.
Monte Carlo addresses both. Here's what 10,000 simulations against this same dataset produce:
| Confidence | Weeks to deliver 50 items | Reading |
|---|---|---|
| P50 (best case) | 9 weeks | Optimistic — 50% of simulations finished by then |
| P85 (target) | 11 weeks | Recommended commitment — 85% confidence |
| P95 (conservative) | 13 weeks | Buffer — 95% confidence, used for stakeholder commits |
The gap between P50 and P95 (9 vs 13 weeks) reflects the team's natural variability. A team with steady weekly throughput would have a tighter range. A team with explosive weeks and dry weeks would have a wider one.
This is the math behind it: instead of one number, you get a calibrated commitment.
Why this works for Azure DevOps specifically
Azure DevOps Boards expose the exact data Monte Carlo needs:
- Closed items per week — derived from
System.ChangedDateand state transitions toClosedorResolved - Stack rank order — a real backlog priority, not a wish list
- Story Points or Effort field — for unit-of-measure consistency
- Area Path filtering — slice by team or squad
Tools that integrate directly with the Azure DevOps API (like our own Nexus Hub) can pull this data on demand and run the simulation against the team's real history without manual exports.
The hard parts (and how to handle them)
Outliers in throughput
If your team had a 22-item week because of a focused bug-bash, that single week shifts the simulation upward — overpromising future delivery. Standard practice is IQR (interquartile range) detection: anything more than 1.5×IQR above the median gets flagged as an outlier, and the user decides whether to keep or remove it.
See our deeper article on throughput outliers for the methodology.
Estimable risk in upcoming work
Monte Carlo assumes the future looks like the past. It doesn't. Some upcoming items are riskier than what the team handled before — legacy refactors, cross-team integrations, spikes into unfamiliar code. AI semantic adjustment reads work item descriptions for risk markers (legacy, refactor, spike, migration) and inflates forecast variance accordingly.
Mid-sprint scope changes
Monte Carlo is a forecasting tool, not a contract. If scope grows, the forecast needs to be re-run. Pinning a target date and tracking confidence drift week-over-week is how you spot scope creep early — when stakeholder confidence drops from P85 to P60, that's the data backing the conversation about scope.
How to talk about Monte Carlo with stakeholders
Most stakeholders haven't seen probabilistic forecasts before. Lead with calibration, not statistics.
What works
- "There's an 85% chance we'll finish by July 10." — Concrete date, calibrated confidence.
- "The buffer date is August 14 — that's the 95% confidence point." — Gives stakeholders the worst-case to plan against without forcing it as the default.
- "This is based on the last 13 weeks of throughput." — Anchors the forecast to observable reality.
What doesn't
- "Our P85 is 11 weeks." — Strangers don't know what P85 means. Translate to a date.
- "It depends on how variability behaves." — True but useless. Give them a range and explain what the range represents.
- "We ran 10,000 simulations." — Stakeholders care about outcomes, not methodology. Mention it only when asked.
Comparing methods
| Method | Output | Captures variability | Calibrated |
|---|---|---|---|
| Velocity averaging | Single date | No | ~50% accurate |
| Reference class forecasting | Date with class adjustment | Sometimes | Better than averaging |
| Monte Carlo | Distribution (P50/P85/P95) | Yes | Calibrated to history |
For a deeper comparison, see Why Velocity Averaging Fails.
Getting started
You don't need a data team to run Monte Carlo forecasts on Azure DevOps. Tools exist that pull throughput from your tenant and run the simulation in seconds:
- Install Nexus Hub Pro from the Visual Studio Marketplace — 14-day free Pro trial, no credit card
- Open Boards → Nexus Hub → Predictive Analytics
- Select your team or area path
- Run a 10,000-iteration simulation against your last 6 months of data
- Pin a target date — confidence drift becomes a tracked metric
The Pro tier includes AI Semantic Adjustment, per-item delivery probability, IQR outlier detection, and execution history — everything covered in this guide.
Try Monte Carlo forecasting on your Azure DevOps backlog
Install Nexus Hub Pro from the Visual Studio Marketplace and run your first simulation in under 60 seconds. 14-day free Pro trial — no credit card.
Install from Marketplace →