This is a bit of a technical post, so drop out now if you are worried

about head-spins.

When we do any sort of statistical study (e.g. how many cars does it

take to clog up the freeway on average?) the results (if they are done

properly) include a margin for error. For example, it takes an

average of 784 cars to clog the freeway, with a error margin of 41

cars either way.

The 'error' can be interpreted in various ways. One way is that the

number of cars required to block the freeway can actually vary from

day to day, depending on weather, size of cars, skill of drivers etc,

so the error reflects the range of values you are likely to come

across (strictly this error is different to the error on the average

but you get the idea).

Another way of interpreting the 'error' is to say that there really is

an 'actual' average (if we watched the freeway for an infinite number

of days) and the average we measure is only an approximation to this

actual average. That is, the 'true' average is probably somewhere

between 743 cars (784-41) and 825 cars (784+41).

So, when you have a limited set of observations of a particular thing

the average of the observations is only ever an approximation to the

'true' average. That is, even if we accept that batting and bowling

averages are the best way of measuring skill, these averages are only

ever approximations of the 'actual' skill of the player.

The key question is: how big is the 'error' on a batting or bowling

average? With a few simple assumptions I think we can get a pretty

good idea.

Let's assume that a bowler has an 'actual' average of 30 runs per

wicket. Let's say that over his career he bowls 12000 balls and

concedes one run every two balls. He would concede 6000 runs in his

career. If his bowling truly reflected his average then he would take

200 wickets.

An average of 30 means that on any ball he has a probability of

200/12000 = 1/60 of taking a wicket.

The error for the result from a simple probability distribution can be

represented by the standard deviation, which is given by:

s.d. = square root of p*(1-p)*n

where p = probability of a wicket (1/60) and n = number of

observations (12000).

Therefore: s.d. = sqrt of 1/60*59/60*12000 = 14

Now, we expect that 95% of all cases the actual result will fall

within two standard deviations of the average result.

So, the number of wickets this bowler will take in his career,

assuming that his 'true' bowling average is 30, will be somewhere

between 172 (200-28) and 228 (200+28).

Now, this means that the bowler's career average will be somewhere

between 26.3 and 34.9. This is quite a wide range.

It is worth repeating in a different way:

Two bowlers, both equally as skilled, could both take around 200

wickets and have very different averages!

Or to make an even stronger point: there is virtually no

justification to rate one bowler as better than another on the basis

of a few points difference in average: the 'error bands' on bowling

averages are too high for them to be of any use.

That is, raw bowling averages provide almost no justification for

separating, say, Lillee (23.92), Holding (23.68), Miller (22.97),

Lindwall (23.03) and Hadlee (22.33).

I would go further and say that once you also take into account

opposition, pitch standard, laws and other such stuff, bowling

averages are pretty much useless as a comparison tool except if

comparing, say, a 20 average with a 26+ average bowler.

If you have got this far then I am interested in reponses!

John Clark