P50, P85, P95 — Reading Probabilistic Delivery Forecasts

The output of a Monte Carlo forecast looks like this:

Confidence	Date
P50	May 14
P85	July 10
P95	August 14

Three numbers. Three commitments. The stakeholder asks, "When will it be done?" — what do you give them?

This article walks through how to read each percentile and which one to commit to depending on context.

The intuition

"P85 = July 10" means: 85% of the simulated futures finished by July 10. Or equivalently: there's a 15% chance the team is still working on it past that date.

Each percentile is a probability statement, not a date. The dates are derived from the probability — given how the team has performed historically, this is the date by which X% of similar scenarios completed.

P50 — Best case

Reading: "Half the simulations finished by this date. Half didn't."

P50 is the median. It's the date that has 50/50 odds.

When to use:

Internal planning ("if we finish by P50 we have buffer for the next thing")
Optimistic scenarios in stakeholder conversations ("at best, May 14")
Sprint-level commits when the team has high confidence

When NOT to use:

External commitments. P50 is a coin flip — half the time you'll miss.
Anything tied to revenue, contracts, or downstream dependencies.

Translation: "Best case, we're done by May 14, but we recommend planning around the more likely range."

P85 — Target / Recommended commit

Reading: "85% of simulations finished by this date. 15% slipped."

P85 is the standard commit point in mature agile teams. It's calibrated enough that you'll hit it most of the time, but not so conservative that you under-deliver.

When to use:

Roadmap commitments to product / executive stakeholders
Public release date communications
Cross-team dependency planning

Why 85% specifically?

It's the sweet spot between confidence and ambition. P50 is too aggressive (50% chance of slipping). P95 is too conservative (15% slack baked in). P85 forces honest planning while leaving accountability sharp.

Translation: "We're committing to July 10. There's an 85% chance we'll deliver by that date based on the team's recent throughput."

P95 — Conservative / Buffer

Reading: "95% of simulations finished by this date. 5% slipped."

P95 is the buffer. It's the date you give to stakeholders who absolutely cannot tolerate a missed commitment — typically external customers under contract, or downstream teams whose work depends on completion.

When to use:

Customer-facing SLA commitments
Contract delivery dates
Compliance deadlines (where missing has legal/financial consequences)
Buffer planning for downstream releases

When NOT to use:

Default planning. If P95 is the only date stakeholders see, your team will pad estimates and feedback loops will degrade.
Internal sprint commits. Use P85.

Translation: "If the schedule absolutely cannot move, we plan around August 14 — that's the 95% confidence date."

The three-tier conversation

Use all three percentiles in conversation, not just one:

"Best case, May 14. Recommended commit, July 10 — that's our 85% confidence target. Buffer for the contractual deadline, August 14 — 95% confidence. We'd suggest committing to July 10 internally and flagging August 14 to the customer success team for any external SLAs."

This frames the conversation around calibrated confidence, not arbitrary dates. The team commits to P85, the customer-facing comms reference P95, and P50 stays internal as a stretch goal.

Common stakeholder objections

"Why isn't this just one date?"

Because the team doesn't deliver work at a single fixed rate. It delivers work at a rate that varies based on scope, team size, and risk. A range reflects reality. A single date is a fiction.

"P85 sounds like an excuse to slip"

The opposite. P85 is harder to hit than the average — most teams' "estimates" are P50 or worse. Committing to P85 means the team is being more conservative, not less.

"We need a date, not a range"

Then commit to P85 and call it the date. Just understand: that date is a probability statement. If conditions change (scope grows, team shrinks, dependencies surface), the probability associated with that date changes too.

"Where do these numbers come from?"

From the team's actual throughput history. The simulation samples real past performance to project forward. It's not a guess — it's an inference. See Monte Carlo Forecasting for Azure DevOps for the methodology.

Tracking confidence drift

The most useful operational practice: pin a target date and track how confidence drifts week over week.

Suppose you committed to "85% confidence by July 10" in early April. Two weeks later, you re-run the simulation. The output shows that July 10 is now only 62% confident — confidence dropped 23 points.

That's a leading indicator. Something changed: scope grew, the team had a slow week, dependencies surfaced. The drop is data — investigate, talk to the team, decide whether to re-baseline or accept the risk.

Without confidence tracking, you'd discover the slip in mid-July. With it, you spot the trend in mid-April and have time to act.

Default thresholds for different contexts

Context	Recommended commit percentile
Internal sprint planning	P85
Quarterly roadmap to PM	P85
Customer-facing release date	P95
Compliance / regulatory deadline	P95 + 10% additional buffer
Cross-team dependency	P85 (with P95 published as worst-case)
Stretch goal / OKR	P50

Adjust based on your organization's tolerance for missed dates. Teams with frequent stakeholder conflicts tend to over-commit (use P50); teams that lose stakeholder trust tend to under-commit (use only P95). Both are signs of broken calibration — P85 is the rebalance point.

Get P50/P85/P95 forecasts on your real Azure DevOps data

Nexus Hub Pro runs Monte Carlo simulations and produces all three percentiles in seconds. Pin a target date, track confidence drift week over week.

Install from Marketplace →

The intuition

P50 — Best case

P85 — Target / Recommended commit

P95 — Conservative / Buffer

The three-tier conversation

Common stakeholder objections

"Why isn't this just one date?"

"P85 sounds like an excuse to slip"

"We need a date, not a range"

"Where do these numbers come from?"

Tracking confidence drift

Default thresholds for different contexts

Get P50/P85/P95 forecasts on your real Azure DevOps data

Related articles

Monte Carlo Forecasting for Azure DevOps: A Practical Guide

Why Velocity Averaging Fails (And What to Use Instead)