Cornerstone · 12 min read

Monte Carlo Forecasting for Azure DevOps: A Practical Guide

Predictive Analytics · Published April 2026

Every engineering manager has been in the room when a stakeholder asks, "When will it be done?" — and felt the gravity of giving an honest answer. Velocity-averaged estimates feel comfortable, but they're wrong often enough that "8 sprints" becomes a running joke.

Monte Carlo forecasting replaces gut feel with probability. It's been used in physics, finance, and project management for decades. Inside Azure DevOps, it gives you something simple but powerful: a probability distribution over delivery dates — not one number, but a range with calibrated confidence.

This guide explains what Monte Carlo forecasting is, how it works on Azure DevOps throughput data, and how to communicate the output to stakeholders who don't care about statistics.

What Monte Carlo forecasting actually does

The premise is straightforward. Take a team's actual throughput history (items closed per week for the last N weeks). Treat that history as a sample of the team's real behavior. Then simulate the future: thousands of times, draw a random sample from the historical distribution, and ask "if next week looks like one of those past weeks, when does the backlog finish?"

Run the simulation 10,000 times and you get a distribution of possible delivery dates — not a point estimate, but a full picture of what's likely.

"Eight sprints" is a guess. "There's an 85% chance we'll finish by July 10" is a forecast.

Worked example: a 50-item backlog

Suppose your team's throughput over the last 13 weeks looked like this:

WeekItems closed
W-135
W-126
W-114
W-107
W-93
W-88
W-75
W-66
W-52
W-49
W-35
W-27
W-14

Average velocity: 5.46 items/week. Linear estimate for 50 items: 50 ÷ 5.46 = 9.16 weeks. Round to 10 weeks. Done.

Except that's wrong in two important ways.

First, it ignores variability. The team had a 2-item week and a 9-item week. The reality is messy. Linear estimates pretend it isn't.

Second, it's a 50% confidence number. Half the futures are slower. Stakeholders who plan against the average will be disappointed half the time.

Monte Carlo addresses both. Here's what 10,000 simulations against this same dataset produce:

ConfidenceWeeks to deliver 50 itemsReading
P50 (best case)9 weeksOptimistic — 50% of simulations finished by then
P85 (target)11 weeksRecommended commitment — 85% confidence
P95 (conservative)13 weeksBuffer — 95% confidence, used for stakeholder commits

The gap between P50 and P95 (9 vs 13 weeks) reflects the team's natural variability. A team with steady weekly throughput would have a tighter range. A team with explosive weeks and dry weeks would have a wider one.

This is the math behind it: instead of one number, you get a calibrated commitment.

Why this works for Azure DevOps specifically

Azure DevOps Boards expose the exact data Monte Carlo needs:

Tools that integrate directly with the Azure DevOps API (like our own Nexus Hub) can pull this data on demand and run the simulation against the team's real history without manual exports.

The hard parts (and how to handle them)

Outliers in throughput

If your team had a 22-item week because of a focused bug-bash, that single week shifts the simulation upward — overpromising future delivery. Standard practice is IQR (interquartile range) detection: anything more than 1.5×IQR above the median gets flagged as an outlier, and the user decides whether to keep or remove it.

See our deeper article on throughput outliers for the methodology.

Estimable risk in upcoming work

Monte Carlo assumes the future looks like the past. It doesn't. Some upcoming items are riskier than what the team handled before — legacy refactors, cross-team integrations, spikes into unfamiliar code. AI semantic adjustment reads work item descriptions for risk markers (legacy, refactor, spike, migration) and inflates forecast variance accordingly.

Mid-sprint scope changes

Monte Carlo is a forecasting tool, not a contract. If scope grows, the forecast needs to be re-run. Pinning a target date and tracking confidence drift week-over-week is how you spot scope creep early — when stakeholder confidence drops from P85 to P60, that's the data backing the conversation about scope.

How to talk about Monte Carlo with stakeholders

Most stakeholders haven't seen probabilistic forecasts before. Lead with calibration, not statistics.

What works

What doesn't

Comparing methods

MethodOutputCaptures variabilityCalibrated
Velocity averagingSingle dateNo~50% accurate
Reference class forecastingDate with class adjustmentSometimesBetter than averaging
Monte CarloDistribution (P50/P85/P95)YesCalibrated to history

For a deeper comparison, see Why Velocity Averaging Fails.

Getting started

You don't need a data team to run Monte Carlo forecasts on Azure DevOps. Tools exist that pull throughput from your tenant and run the simulation in seconds:

  1. Install Nexus Hub Pro from the Visual Studio Marketplace — 14-day free Pro trial, no credit card
  2. Open Boards → Nexus Hub → Predictive Analytics
  3. Select your team or area path
  4. Run a 10,000-iteration simulation against your last 6 months of data
  5. Pin a target date — confidence drift becomes a tracked metric

The Pro tier includes AI Semantic Adjustment, per-item delivery probability, IQR outlier detection, and execution history — everything covered in this guide.

Try Monte Carlo forecasting on your Azure DevOps backlog

Install Nexus Hub Pro from the Visual Studio Marketplace and run your first simulation in under 60 seconds. 14-day free Pro trial — no credit card.

Install from Marketplace →