Tuesday 21 May 2013

Reducing Optimism Bias in Three Point Estimates



Reducing Optimism Bias in Three Point Estimates

Promax uses the PERT method to help with uncertainty in scoring. It uses a simple formula to give an approximation of uncertainty based on best and worst estimates.


The formula is based upon giving three estimates. The first is the best estimate “H”. The second (middle) estimate is the most likely score “M” and the third estimate is the lowest score expected “L”.

The likely score will be: S = (H + 4M + L) / 6

For example, lets assume you’re in a workshop with a number of experts discussing a score for a particular option. The general consensus is that it’s worth a 7. However, it’s possible that it could be an 8. However, some also think it’s not as good as the rest believe and think it should be a 4. So the three points of the estimate are:

M = 7
H = 8
L= 4


Then the score to be used will be (8 + 4x7 + 4) / 6 = 6.7

The standard deviation calculation is: SD = (H – L) / 6

So for our example, SD = (8-4) / 6 = 0.7

What this means is that there is a 68% chance that the score will lie between 7.4 and 6.0.

In Promax, the PERT value and the standard deviations of the individual criteria (using the mapped and weighted values) are calculated.



Criteria, Most Likely, High, Low, PERT, SD
#1, 50, 70, 40, 51.7, 5.0
#2,80, 90, 75, 80.8, 2.5
#3, 90, 100, 80, 90.0, 3.3
#4, 20, 50, 10, 23.3, 6.7
#5, 70, 80, 50, 68.3, 5.0

The PERT score is (314.1 / 5) = 62.8 and the SD is (22.5 / 5) = 4.5

Therefore, there is a 68% chance that the “score” will lie between 67.3 and 58.3. There is a 95% chance it will lie between 71.9 and 53.8.

The boxes on the Promax graphs reflect this range of 95% confidence. Therefore, rather than choosing the single point estimate to assess an option, use the range instead. If there is a major overlap you cannot be certain that the best option from a single point perspective is in fact the best one if uncertainty is taken into account.

If there is major overlap there are two generic strategies. The first is to try and reduce the uncertainty range by collecting better, more accurate data. The second is to choose the option with the smallest range of the overlapping ones (the risk averse strategy) wince there is more chance of meeting the overall score.

There are some things worth considering in the PERT approach.

1.     68.27% of all measurements fall within one standard deviation of the mean. 95.45% of all measurements fall within two standard deviations of the mean. 99.73% of all measurements fall within three standard deviations of the mean.
2.     Strictly, the probabilities above refer to repeating the scoring enough times to collect many values. Had we repeated the scoring 100 times we would get 100 values. In this example we are saying that on 68% of the occasion that we carry out the scoring the result will lie between 85.3 and 40.3 and outside of this range on the other 32% of occasions. In effect when someone estimates a score they are doing this on the basis of experience by mentally simulating what is likely to happen if they repeated it many times.

Background to PERT

PERT (program evaluation and review technique) was developed by NASA in 1962. As well as developing network / precedence diagraming to aid project planning they also developed methods to estimate project duration. Their formula (H + 4M + L) / 6 is essentially a beta distribution where the tail is longer on the pessimistic side. Fifty years later, this distribution is still very much applicable.


Techniques for Eliciting Scores

The scoring of options is a fascinating subject in its own right. It should be remembered that scores reflect the performance of an option on a particular criteria. In Promax, scores are then mapped to a value scale (0-100) in order to be able to add different criteria together (essentially to achieve the same units). Depending on how the criteria have been selected and then how the criterion is to be measured, the scores can either be real-world units or based on judgement. Real-world units are to be preferred in all cases but it may not be feasible particularly at the early stages of an evaluation.

Where expert judgement is to be used the best method to get an expert to score the option on the criteria for which they are expert on. This is often termed “Expert Elicitation (EE)”. This will provide an evidence-base which will justify their score and can be used subsequently to challenge and review.

The least successful method is to use a “decision conference” where a group of experts come together and collectively score using consensus. The problem with this technique is that the rationale for the scores is generally weak and cannot be adequately underpinned. The consequences are that in any future review or challenge the scores provided are extremely difficult to recreate with a different group of people.  

Whenever using expert judgment eliciting good three point estimates is difficult.  There are a number of biases that are introduced including:


  • Availability –judgments are based on information most easily remembered
  • Representativeness – judgments are based on similar yet limited data and experience. The expert does not fully consider other relevant, accessible and/or newer evidence
  • Anchoring – Fixing on a particular value in the range and making insufficient adjustments away from it in constructing an uncertainty estimate
  • Overconfidence (Optimism bias) – Strong tendency to be more certain about one’s judgments and conclusions than one has reason.
  • Control – expert believes they can control or had control over outcomes related to an issue at hand; tendency of people to act as if they can influence a situation over which they actually have no control.

The following is a good approach to reduce these biases without being overly complex or time-consuming.


Step 1 - Framing


A - Criteria Definition
An extremely important step is to define the criteria as well as possible. In Promax, there are several ways to achieve this. First, there is a title and a description. Secondly there is the mapping stage. This is where a scale for scoring is devised. Whilst it may be easy to stick with a 1-10 scale it is much better to use a textual scale since it avoids the natural tendency to do arithmetic on the scores.

B - Resume
Understand the experts contribution in terms of their previous experiences in assessing the performance of options on the specific criteria being assessed.

C – Assumptions List
List up to 5 assumptions to be used in assessing the performance of each option on the criteria.

Step 2 – Scoring (1st Iteration)

What’s the Most Likely value, M?  
What’s the Lowest value, L
What’s the Highest value, H  

This 1st iteration tends to result in anchoring bias on M, over-confidence on L and H, and poor rationale.

Step 3 – Situational Awareness


A – Mini Risk Review
What factors contribute to the uncertainty in the scores?
What impact do these factors have on the score?
What’s the likelihood that these risks will actually happen?

List all the risks associated and estimate their impact on the score:

  • Risk Factor
  • Description
  • Likelihood of Risk Occurring
  • Impact on Score

What probability would the expert assign to “Very Unlikely”? (e.g. 10%)
What probability would the expert assign to “Extremely Unlikely”? (e.g. 1%)

Likelihood, Description, Probability
Absolutely Impossible, No chance, 0%
Extremely Unlikely, Nearly impossible to occur. Very rare, 1%
Very Unlikely, Highly unlikely to occur. Not common, 10%
Indifference 1, Indifferent between “Very Unlikely” and “Even Chance”, 30%
Even Chance, 50/50 chance of being higher or lower, 50%
Indifference 2, Indifferent between “Even Chance” and “Very Likely”, 70%
Very Likely, Highly likely to occur. Common occurrence, 90%
Extremely Likely, Nearly certain to occur. Near 100% confidence, 99%
Absolutely Certain, 100% likelihood, 100%

The “Very Likely” and “Extremely Unlikely” probabilities are in opposite of those given by the expert for “Very Unlikely” and “Extremely Unlikely”.

For the impact use the following table:
Impact on Score, Scale
Negligible, 0
Minor, 2
Notable, 4
Substantial, 8
Catastrophic, 10

List the top 5 risks. The risk score is the likelihood x impact.


B – Scenario Development
Using the information in the Assumptions List and the mini risk register, write down a scenario (chain of events) that would represent a “reasonable worst case”.

Repeat for a “reasonable best case”.

Finally repeat for a ‘reasonable most likely case”.


Step 4 – Scoring (2nd Iteration)

Based on the three scenarios developed, how would you now score the option performance?

M =
H =

L =

No comments:

Post a Comment