Sunday, 28 October 2012

In Support of Objectivity

 Do you come across organisations running so called 'decision conferences". These are, in the main, a two-day workshop involving a number of people who are there to evaluate a range of options. The participants are "experts" and they score each option against a range of criteria before the option with the best score wins through.

This approach leads to a number of issues that develop later on - particularly for major decisions in large organisations.

The 2-day workshop seems ubiquitous. But why two days? Clearly because some people are travelling of course but surely this isn't the best reason. The length of the workshop needs to be as long as it needs to be and this is dependent on the objectives and hence agenda of the workshop. It's also worth bearing in mind the percentage of time for that workshop compared to the project duration itself. In many cases the actual decision is made based on spending just 0.1% to 0.3% of the available time on it and sometimes even less for mega-projects. Is that sufficient for major decisions? Analysing strengths and weakness and how robust the decision is to changing in data and stakeholder preferences is well worth investing more time.

Experts. Just how expert are they? And what are they expert about? We've all come across the loud, opinionated participant - the self confessed all-knowledgeable person. But whether these are really experts is something else. An expert should be someone who has spent many years on a particular topic and could stand their ground with other experts across the world. Just because they know a little more than the others in the room on a particular subject (maybe they're representing a department or organisation) does not make them an expert. Mark Twain defined an expert as "an ordinary fellow from another town".

Experts, by definition, know a lot about very little. They have an incredibly deep understanding of a topic that they've studied for many years. Interestingly, in a multi-criteria workshop this may mean they may know a lot about how an option performs for one or two of the criteria but clearly not for them all. So asking for their opinion on costs, for instance, may not be their strong point.

The expert offers an opinion. And opinion is subjective. So, the experts provide a subjective score and the best option wins. Unfortunately, what this means in practice is that you have no evidence to back up that opinion so that when it is reviewed or audited it fails in dramatic style. You cannot possibly recreate the decision result unless you have the same experts in the room. Clearly for decisions that involved multiple stakeholders and several hurdles of reviews this approach won't get you very far - no matter how "expert" the experts. At some point, the scores will be reviewed, different ones used and (mostly) a different option will be favoured.

A far better approach is to collect evidence beforehand on how each option performs against each criterion. For many criterion you will be able to obtain decent quantitative measures and it is well worth putting in the effort as real-world number are easy to substantiate.

For others you may need to use a rating scale. These capture subjective opinions with words or numbers; so a 0-10 scale or a High, Medium, Low scale. These sorts of scales are not designed to capture opinions but estimate the magnitude of differences. These scales are still quantitative however even if you use words to describe it - they have equal intervals between each point and represent an order from "bad" to "good" or "less" to "more".  It is vital that each point is described in as much detail as possible so that everyone knows what a "High" looks like. Part of the process should also be to write down why each option is scored that way for each criterion. This means a paragraph or two written by an "expert" ideally with references and sources to additional information to support the score. This is similar to how a "safety case" or "safety basis" may be written but obviously much shorter in length!


A very good detailed explanation on these terms can be found here.

The point of collecting the evidence is that the decision is then based on something that can be reviewed and audited by others who would themselves come up with the same decision given this information.

If someone questions the decision, new evidence needs to be provided. This reduces significantly the effort required when someone with a new opinion arrives (maybe a new Head or other change in staff or perhap a stakeholder who wasn't intimately involved). Also, if new evidence emerges during the decision process, the original decision can be assessed against this - under change control. So, you don't flip from one choice to another choice seemingly at random. If a choice needs to change there's a good reason for it backed up with evidence.

Conclusion

Resist the temptation to have a scoring workshop where experts turn up and provide the performance evaluation for each option. It is far better to collect evidence beforehand and let the experts discuss the validity of the evidence and analyse the results. This will ensure that the decision will be evidence-based and can be realistically audited/reviewed by others. It is a much more robust process to use.

Spend a little more time analysing the decision. Do not fall into holding a 2-day workshop just because that's what is always done. Spend the right amount of time with the right amount of people for the decision at hand.

Sunday, 21 October 2012

Strictly Scoring System

It's that time of the year when a new season of Strictly Come Dancing has started. In the US it's called "Dancing with the Stars". It has a rather unusual method of evaluating the couples which is what this blog is about.

The format is that each couple (a professional dancer and a celebrity) do a dance to impress a panel of experts. There are four of these and they each score each couple up to a maximum of 10. So the minimum a couple could get is zero and the maximum is 40. For interest the minimum score has been 8 (Quentin Wilson & Hazel Newberry) and lots have received 40. The Wikipedia page has all the details. This then gives the Leaderboard which ranks the couples by score.

So for the first show the Leaderboard looked like this:





The second element is the Viewer's Vote. For this part, viewers phone in for their favourite couple to keep in the show. In this case the number of votes cast is not published which is difficult if you want to analyse the results. Apparently a BBC spokesman said: "We never reveal exact figures from our shows as we have a relationship of trust with our contestants and it would be unfair to disclose the exact nature of difference in their popularity." This thread  has  a good discussion on why the votes aren't published (celebrity ego, influencing voting, programme ratings and others). Many of these are applicable in real-world decision making of course!

Since they don't publish the actual numbers, the viewer's votes are put into rank order. That is, the couple with the highest number of votes is at the top, then the second highest is next and so on until the least popular couple are at the bottom. So, if there are 14 couples, then the couple who have the highest number of votes is given a score of 14.  The couple with the next highest number of votes if given 13 and so on right down to a score of 1.

In order to combine the judges score with the voters score, the judges score is also converted into a rank order.



The two couples with the lowest combined score go into a "dance off" and the judges score (based on this final dance) wins and the loser goes home.

So what issues arise with this methodology that might be applicable in business decisions?

1)  It is quite clear that the judges scores are variable. Although all experts, they clearly look for different criteria when scoring each couple and consensus is rare. The scores also change over time, so while they do improve as the couples improve there is still little to compare a score of 6 in the early stages with a score of 6 in the latter stages. This is very common when using numerical scales as it is not easy to get everyone to define a score of 6 in the same way.
2) The judges use what is called an "Interval Scale". This scale tells us the order and the size of the intervals between the scores. 
3) The Viewer's scores however use an " Ordinal Scale". This means that you can tell what the magnitude of the difference between the real scores.
4) When the judges scores are changed from an interval scale to an ordinal scale then the benefits of knowing the magnitude of the difference is lost. This means that getting a fantastic score may not be any better than getting a single point more than the next best couple.

It would be great if the BBC were to use the actual viewer votes. But this isn't likely to happen.


Conclusion

Obviously Strictly is only a television show but some of the techniques they use have issues that could easily make it into more important decisions in the workplace.

Wherever possible use interval scales rather than just putting thinks into a rank order as you can then see whether any options are much better than others. Secondly, if you have to use a numerical scale for scoring then try to calibrate the scale and explain what is meant by a 5 for instance. Make sure this recognises that a 10 is twice as good as a 5. Don't assume experts already know this!


Wednesday, 17 October 2012

How Many Criteria?

I'm often asked "What's the best number of criteria to use?"

As a matter of simple arithmetic, the more you have, the less impact each one will have on the decision. So if you have 4 criteria and each one is weighted equally, then each will contribute 25% to the decision. If you go up to 10 criteria then the contribution of each drops to 10% and so on. When you have over 10, even doubling the importance of a single criteria will make little difference to the final result.

However, a better way of asking that question is to turn it around and not query how many criteria should I have but what criteria should I use. The number that results is the number you need.

In determine what the criteria should be, it is never a good idea to simply brainstorm within your team to come up with ones you think are differentiators. A far better approach is to use a hierarchy of objectives. Here you consider the overall objective or mission you're trying to accomplish. Then question what lower level objective supports the main mission. Finally what sub-objectives or measures support those. It's these lower level objectives - ones that are measurable - that should be your criteria.


For most evaluations you will easily end up with more than 4 or 5! However, that isn't a problem as long as you weight them at the right level. In the diagram above, there are 4 main objectives contributing to the Mission so we weight at that level instead.

In Promax this is called "Top Down" weighting. Go to the weights tab and click the Top Down button. An arrow will appear next to the relevant weight set. This means the calculations will be based on the weights of the topics or nodes (the red boxes) rather than criteria (grey boxes).

By default, in top down weighting, the criteria below each topics are equally weighted. So if you have more criteria in one topic than in another this may lead to unforeseen results. You can easily alter them to reflect a more correct weighting but remember that it's the weights at the topic level that have the final say.

Conclusion

Try to develop criteria based on objectives rather than randomly brainstorming them. That way you can demonstrate how each option contributes to the overall mission. You can much more easily show in what areas it excels and where it is weak.

Sunday, 14 October 2012

Criteria and Attributes

-->
Criteria and attribute are often used interchangeably. Often people say they do multi-criteria decision analysis (MCDA) or maybe multi-attribute decision analysis (MADA). They mean the same technique but it is worth knowing that they are, in fact, different.

Criteria are the factors which options are to be measured against. Criteria are plural and criterion is singular. Attributes are the properties of the criterion.

So for example, in looking at choosing a car, one of the factors that may be important is luggage space. 

-->
The grey boxes are criteria and the red ones are topic areas.

If you think about it, however, there are many different ways to measure luggage space; overall volume is an obvious one. However, this may not be the best choice if you want to transport specific items. You might be a golf fanatic and so it’s important that you can fit a set of golf clubs. These are typically fairly long and require quite a lot of width. The required height is small so a wide boot (trunk) is more important than a tall narrow one. 

Similarly you might want a boot (trunk) that takes rectangular items like suitcases in which case a flat load space that fits two side by side and tall enough for another layer is what’s required. The overall volume is less important.

Knowing this, it is important that when defining each criterion the question is asked "how would you measure success? You may be surprised at the different responses you get for what appears to be exactly the same criterion! This is especially true when you have different stakeholders involved.

Promax software uses the criteria properties dialogue box to set attributes. Right click on a criterion and the box pops up.


You can add a description to explain more about the criterion which is useful (essential) for explaining to others, then decide whether it's a cost, benefit or a harm (to be covered in a later blog but choose Benefit most of the time) and finally click "Edit Measurement Details". 

This comes up with the attribute details as below.


Choose your units and, if necessary, the number of decimal places to be used and that's it! You now have a properly defined criterion that can be used for evaluating options.





Saturday, 6 October 2012

Promax Launching This Week

-->
This week we will be launching version 3 of Promax – our multi-criteria software. It has lots of new features from version 2 that we hope our customers will find useful and exciting.

Multi-criteria software is used for aiding decisions where you have more than one factor (criterion) to consider. It is well recognised that most people struggle to assess multiple criteria fairly and often make decisions based on their overall gut-feel; a High, Medium, Low if you like. Whilst this may be adequate for decisions that are relatively small and only involve a few people it is not a good technique for larger decisions or where there are many people involved. These sorts of decisions need better transparency and visibility of how the different factors contribute to the decision.

Although people use spread sheets to carry out their calculations it should be remembered that many (most) have had no training or experience with decision science. This can and does lead to errors in calculations and incorrect methods for valuing and weighting criteria. Additionally it is difficult to carry out adequate sensitivity analysis and view the result in anything other than a simplistic manner.

Purposely designed multi-criteria software removes these drawbacks so you can be sure the result is, in fact, valid – which for important decisions is surely extremely important. The initial cost of the software will quickly be repaid in the speed of model build and subsequent analysis. It is certainly far cheaper than using spread sheets. Simply calculate the hours spent every time someone builds the spread sheet, someone checks the calculations the the graphs are developed. Multiply these hours by the persons rate and you'll probably be surprised at the eventual cost. A few decisions will be enough to fund the software purchase and that doesn’t even include the opportunity cost of the spread sheet person (i.e. they could/should be doing something else more valuable to the business).

Promax is unique in that it has embedded well-recognised multi-attribute utility calculations into a tool that is of practical use to people who aren’t decision-science experts. It is designed around a structured decision-making methodology taking the user through defining the problem, developing criteria, weighting the criteria, identifying options and then scoring the options. The results are presented in a myriad of ways which is important to assess the robustness of the decision – the biggest overall score may not necessarily be the best option.  

Some of the great features are:
  • Problem Definition – state where you are now, where you want to be and what the barriers are using a prompt-based approach.
  • Creating a value tree using drag and drop – just create criteria using a mind mapping approach and drag criteria around to fit under different topics.
  • Multiple ways of weighting – direct weighting, pairwise comparison and swing weighting gives plenty of choices to ensure you can weight criteria accurately as to their importance.
  • Options with notes – a drag and drop approach to creating options gives great flexibility. You can add notes, images, attachments and even rate the option right on the display.
  • Creativity & Triz tools / database – this functionality for coming up with ideas is much more effective than using traditional brainstorming. These exciting tools can generate many more options than you thought possible.
  • Scoring with sliders and add notes – a cool new slider for scoring plus the ability to add notes to explain the rationale behind the score.
  • Results – there are many ways of viewing. From a bar chart and table through to looking at strengths and weaknesses of particular options. Also, looking at value for money is a fantastic display.
Additionally there are lots of behind the scenes enhancements such as the ability to change colours, undo, import/export to/from excel and address uncertainty with three point estimates.

The professional version adds resource allocation to the software. This allows you to have combinations of options and pick the best combination for a given budget (or other resource). This is used for value engineering and for prioritisation within funding constraints.

For more information see the Promax web page.