How I Stopped Hating Commitment Dates

The slow walk from PERT to flow metrics, told without the marketing

Jun 10, 2026

Why the planning fallacy makes single-date estimates almost guaranteed to fail
Parametric, top-down, and three-point PERT estimates carry biases that no review process removes
Classical PMBOK Monte Carlo runs the right maths against the wrong baseline
Flow metrics and continuous Monte Carlo simulation give producers a forecast they can actually defend

The single most stressful thing about being a game producer is committing to a delivery date. I have done it for thirty years and it never got easier until I changed the method.

You sit in a room. Someone senior asks when it will ship. You give a number. You then spend the next eighteen months defending that number to people who weren’t in the room. The number is almost always wrong. Everyone knows it’s almost always wrong. The pretence that it isn’t is the part that breaks you.

Why the estimate is broken before you start

In 1979 two psychologists, Daniel Kahneman and Amos Tversky, published a paper that quietly described the producer’s entire problem. They called it the planning fallacy. People underestimate the time, cost, and risk of future tasks, and they do so even when they have a long history of being wrong about the same kind of task.

Kahneman later argued that the cause is something he called the inside view. When we plan a project we tell ourselves a coherent story about how it will go. The story is plausible. It does not include the things we don’t yet know. The outside view, by contrast, asks what happened the last time we did something like this. The outside view is duller and almost always more accurate. Almost nobody uses it.

The Decision Lab summarises decades of follow-up research in plain terms: the planning fallacy describes our tendency to underestimate the amount of time it will take to complete a task, as well as the costs and risks associated with that task, even if it contradicts our experiences. The word that matters is “even.” We know we’re wrong. We do it anyway.

For a producer the bias is doubled. You are estimating other people’s work, not your own. You are estimating it under pressure. You are estimating it while someone in finance is asking whether you can pull a quarter forward.

The toolkit I used to reach for

Every estimation method I learned in the first half of my career was a different way of structuring the same wishful thinking.

Parametric estimation was the first. Take the size of the thing, multiply by a coefficient drawn from history, and produce a number. It works for buildings. It works less well for software, and it works worst of all for games, where the scope is unstable until late in production.

Top-down came next. Senior people decide what the thing should cost, and the team is asked to make it fit. Bottom-up was the inverse. Add up every task and produce a total. Both produce numbers. Neither produces forecasts you can trust, because the inputs at every level are guesses dressed up as data.

Three-point PERT was the one I held onto longest. You ask each lead for an optimistic, pessimistic, and most-likely estimate, then weight them. The weighting felt like rigour. It wasn’t. It was three guesses instead of one. The biases I was trying to remove sat inside each guess.

Averaging biased numbers produces a biased average.

These techniques have their place. They will give you a directionally useful figure for a pitch deck. What they will not give you is a forecast you can defend at month nine of an eighteen-month project, when reality has diverged and someone in the C-suite wants to know why.

The first probabilistic approach I trusted, and what was wrong with it

When I first read the PMBOK section on Monte Carlo simulation I thought it was the answer.

The idea is simple. You take your estimates, attach a probability distribution to each one, then run the model ten thousand times. The output is a distribution of possible completion dates. You can read off a confidence level: 50%, 80%, 95%. You can size your reserves against the tail. The maths is sound. The software is cheap. I used it for years to calculate slack and contingency.

The problem was upstream. Every input to the simulation was a biased estimate. I was running the simulation on numbers I already knew were wrong. The probability curve was elegant. The baseline it was built on was not.

This is the failure mode that does not get discussed enough. A probabilistic technique applied to a biased baseline gives you a probabilistic estimate of a biased baseline. The maths is still right. The conclusion is still wrong.

You have dressed up the planning fallacy in a normal distribution and a confidence interval.

The other half of the problem nobody talks about

There is a second pressure that runs underneath all of this, and it changes the maths in a way most estimation books do not address.

Producers are often not asked when something will ship. They are told. The ship date is set by a marketing window, a fiscal calendar, a co-development agreement, or a publisher’s slate plan. The budget is set the same way. You are then handed a piece of work and asked to produce a plan that arrives on the date and lands within the budget.

In that situation the estimate becomes a reverse-engineering exercise. You start with the answer and work backwards. The pretence of an objective forecast is dropped. What you are doing is constructing a plausible-looking justification for a commitment that has already been made.

The estimate becomes a reverse-engineering exercise. You start with the answer and work backwards.

The reason this matters for the conversation about Monte Carlo is that the upstream methods I described above are all inputs to a process that begins with the answer. PERT does not save you if the answer is fixed. Parametric does not save you. PMBOK Monte Carlo does not save you, because the simulation runs on baselines you reverse-engineered to fit.

The shift that actually helped

The thing that changed for me wasn’t a better estimation technique. It was a different input.

I had been reading Eli Goldratt on the Theory of Constraints and trying to understand how work actually moves through a development system. From there I came to David Anderson’s Kanban method, and then to Don Reinertsen’s Principles of Product Development Flow. Reinertsen opens by arguing that the conventional approach to product development is fundamentally broken, then spends the rest of the book explaining why. His central claim is that invisible queues are the underlying cause of slow product development, and that batch size, work in progress, and variability matter more than the schedule on the wall.

What this body of work pointed me towards was a different kind of data. Instead of estimating how long a piece of work would take, I could measure how quickly the team was actually finishing work. Cycle time. Throughput. Work in progress. The metrics already existed in the ticketing system. I had simply not been treating them as the basis for a forecast.

Daniel Vacanti’s two books, Actionable Agile Metrics for Predictability and When Will It Be Done?, made the connection explicit. You can run a Monte Carlo simulation against historical throughput data instead of against estimates. The simulation samples the team’s past performance, randomising the order, and produces a forecast distribution for the work remaining. Most implementations run ten thousand iterations and take a few seconds to complete.

The baseline is no longer a guess. It is what the team has actually done, last month, last quarter. It updates every week you run it. The longer the project runs, the more reliable the forecast becomes, because the data informing it is the data the team is generating.

Vacanti has a line on this. “No WIP limit = no flow, which means no predictability.” It captures the discipline this method requires, and it is the part that gets cut from most adoption stories.

What changes in the C-suite

The thing producers do not always get to talk about is the meeting. The MBA in the corner office wants a date. They have been asked the same thing by their board. The conversation is short, the room is hostile, and you have one chance to defend your position before next quarter’s planning cycle re-opens it.

With PERT and a Gantt chart you have an opinion. With flow-based Monte Carlo you have a probability curve generated from the team’s measured performance. You can say: at the team’s current throughput, simulated across ten thousand iterations, there is an 85% probability the work completes by date X, a 50% probability by date Y. If you want to land closer to Y, here are the constraints I need lifted. Here is what changes about scope, headcount, or work in progress.

This is a different conversation, with a different artefact on the table. The MBA can argue with the data, the inputs, the model, the assumptions. They cannot easily argue with a curve built from their own team’s last six months of work.

You will still lose some of those arguments. The commitment will get fixed for reasons that have nothing to do with your forecast. The difference is that you have shown your working. When the project goes sideways at month nine, you can pull out the forecast from month one and the forecast from month nine, and the trajectory tells the story. The conversation moves from blame to what changed.

What this asks of you

I want to be honest about what flow-based forecasting requires, because the adoption is harder than the maths.

It needs WIP limits that are actually enforced. Work in progress is the enemy of throughput in any queueing system, and the moment you allow unlimited parallel work, your cycle time data becomes meaningless. Reinertsen spends a chapter on this. Vacanti opens with it.

It needs a stable process. If the team is reorganising every six weeks, the throughput data from the last quarter does not apply to the next quarter. The forecast assumes the system you measured is the system you are forecasting about.

It needs the data to be collected honestly. Started dates and finished dates on tickets, with no fudging when the sprint review approaches. If your team’s Jira hygiene is poor, your Monte Carlo is poor. The software cannot rescue the data.

It needs the producer to have read enough of Vacanti, Reinertsen, and Anderson to defend the model when challenged. The MBA in the C-suite has likely done a quantitative MBA. They will ask questions. You need to be able to answer them without flinching.

None of this is exotic. Around twenty to thirty completed cycle times in the historical sample is the rough threshold before the forecast stabilises. The tools are mature. The books are short. The technique is available to anyone running a board in Jira, Linear, or Azure DevOps.

What changed for me

I stopped defending dates I did not believe in.

That is the smallest possible version of what changed, and it is also the whole thing. When the conversation moved to a probability curve I could rebuild every fortnight, the arguments got shorter. The forecasts got better. The relationship with the C-suite changed from quarterly confrontation to ongoing recalibration. Dates still slipped. The slippage was visible earlier, and the conversation about what to do about it was no longer a fight about whose gut was more credible.

Discussion about this post

Ready for more?