## error bands on averages (or, why averages are pretty useless)

### error bands on averages (or, why averages are pretty useless)

This is a bit of a technical post, so drop out now if you are worried

When we do any sort of statistical study (e.g. how many cars does it
take to clog up the freeway on average?) the results (if they are done
properly) include a margin for error.  For example, it takes an
average of 784 cars to clog the freeway, with a error margin of 41
cars either way.

The 'error' can be interpreted in various ways.  One way is that the
number of cars required to block the freeway can actually vary from
day to day, depending on weather, size of cars, skill of drivers etc,
so the error reflects the range of values you are likely to come
across (strictly this error is different to the error on the average
but you get the idea).

Another way of interpreting the 'error' is to say that there really is
an 'actual' average (if we watched the freeway for an infinite number
of days) and the average we measure is only an approximation to this
actual average.  That is, the 'true' average is probably somewhere
between 743 cars (784-41) and 825 cars (784+41).

So, when you have a limited set of observations of a particular thing
the average of the observations is only ever an approximation to the
'true' average.  That is, even if we accept that batting and bowling
averages are the best way of measuring skill, these averages are only
ever approximations of the 'actual' skill of the player.

The key question is:  how big is the 'error' on a batting or bowling
average?  With a few simple assumptions I think we can get a pretty
good idea.

Let's assume that a bowler has an 'actual' average of 30 runs per
wicket.  Let's say that over his career he bowls 12000 balls and
concedes one run every two balls.  He would concede 6000 runs in his
career.  If his bowling truly reflected his average then he would take
200 wickets.

An average of 30 means that on any ball he has a probability of
200/12000 = 1/60 of taking a wicket.

The error for the result from a simple probability distribution can be
represented by the standard deviation, which is given by:

s.d. = square root of p*(1-p)*n

where p = probability of a wicket (1/60) and n = number of
observations (12000).

Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

Now, we expect that 95% of all cases the actual result will fall
within two standard deviations of the average result.

So, the number of wickets this bowler will take in his career,
assuming that his 'true' bowling average is 30, will be somewhere
between 172 (200-28) and 228 (200+28).

Now, this means that the bowler's career average will be somewhere
between 26.3 and 34.9.  This is quite a wide range.

It is worth repeating in a different way:

Two bowlers, both equally as skilled, could both take around 200
wickets and have very different averages!

Or to make an even stronger point:  there is virtually no
justification to rate one bowler as better than another on the basis
of a few points difference in average:  the 'error bands' on bowling
averages are too high for them to be of any use.

That is, raw bowling averages provide almost no justification for
separating, say, Lillee (23.92), Holding (23.68), Miller (22.97),
Lindwall (23.03) and Hadlee (22.33).

I would go further and say that once you also take into account
opposition, pitch standard, laws and other such stuff, bowling
averages are pretty much useless as a comparison tool except if
comparing, say, a 20 average with a 26+ average bowler.

If you have got this far then I am interested in reponses!

John Clark

### error bands on averages (or, why averages are pretty useless)

Quote:

> This is a bit of a technical post, so drop out now if you are worried

> When we do any sort of statistical study (e.g. how many cars does it
> take to clog up the freeway on average?) the results (if they are
done
> properly) include a margin for error.  For example, it takes an
> average of 784 cars to clog the freeway, with a error margin of 41
> cars either way.

> The 'error' can be interpreted in various ways.  One way is that the
> number of cars required to block the freeway can actually vary from
> day to day, depending on weather, size of cars, skill of drivers etc,
> so the error reflects the range of values you are likely to come
> across (strictly this error is different to the error on the average
> but you get the idea).

> Another way of interpreting the 'error' is to say that there really
is
> an 'actual' average (if we watched the freeway for an infinite number
> of days) and the average we measure is only an approximation to this
> actual average.  That is, the 'true' average is probably somewhere
> between 743 cars (784-41) and 825 cars (784+41).

> So, when you have a limited set of observations of a particular thing
> the average of the observations is only ever an approximation to the
> 'true' average.  That is, even if we accept that batting and bowling
> averages are the best way of measuring skill, these averages are only
> ever approximations of the 'actual' skill of the player.

> The key question is:  how big is the 'error' on a batting or bowling
> average?  With a few simple assumptions I think we can get a pretty
> good idea.

> Let's assume that a bowler has an 'actual' average of 30 runs per
> wicket.  Let's say that over his career he bowls 12000 balls and
> concedes one run every two balls.  He would concede 6000 runs in his
> career.  If his bowling truly reflected his average then he would
take
> 200 wickets.

> An average of 30 means that on any ball he has a probability of
> 200/12000 = 1/60 of taking a wicket.

> The error for the result from a simple probability distribution can
be
> represented by the standard deviation, which is given by:

> s.d. = square root of p*(1-p)*n

> where p = probability of a wicket (1/60) and n = number of
> observations (12000).

> Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

Am with you so far.

Quote:
> Now, we expect that 95% of all cases the actual result will fall
> within two standard deviations of the average result.

Why 95% and why two? I'd expect that in > 95% of all cases, the actual
result will fall within < 2 s.d.'s of the average.

Quote:
> So, the number of wickets this bowler will take in his career,
> assuming that his 'true' bowling average is 30, will be somewhere
> between 172 (200-28) and 228 (200+28).

> Now, this means that the bowler's career average will be somewhere
> between 26.3 and 34.9.  This is quite a wide range.

I think if you change two s.d.'s to one, then you'll get a range from
28.x to 32.x, which is probably more like it. I mean, I think it's
possible that two bowlers of equal skill can have 28.x and 31.x
averages, but unlikely that they can have 27.x and 33.x averages. In
the latter case, I'd suggest the bowlers weren't of equal skill. Hence,
in retrospect, I am saying that it should be one s.d. and not two, for
bowling avgs.

Quote:
> It is worth repeating in a different way:

> Two bowlers, both equally as skilled, could both take around 200
> wickets and have very different averages!

> Or to make an even stronger point:  there is virtually no
> justification to rate one bowler as better than another on the basis
> of a few points difference in average:  the 'error bands' on bowling
> averages are too high for them to be of any use.

> That is, raw bowling averages provide almost no justification for
> separating, say, Lillee (23.92), Holding (23.68), Miller (22.97),
> Lindwall (23.03) and Hadlee (22.33).

True. Bowling average is useless in separating bowlers with 22.x and
23.x averages.

Quote:
> I would go further and say that once you also take into account
> opposition, pitch standard, laws and other such stuff, bowling
> averages are pretty much useless as a comparison tool except if
> comparing, say, a 20 average with a 26+ average bowler.

I think average can be useful when comparing a 22.x and a 25.x average
bowler also. All of the bowlers you've mentioned above appear to rate
above (say) Willis and Roberts in most people's estimation. It's not
just 20.x and 26.x bowlers that averages can successfully separate.

Quote:
> If you have got this far then I am interested in reponses!

Your point re: margin of error is taken. I just don't think the margin
is as wide as you make it out to be.

-Samarth.

- Show quoted text -

Quote:

> John Clark

### error bands on averages (or, why averages are pretty useless)

Quote:

> Now, we expect that 95% of all cases the actual result will fall
> within two standard deviations of the average result.

> So, the number of wickets this bowler will take in his career,
> assuming that his 'true' bowling average is 30, will be somewhere
> between 172 (200-28) and 228 (200+28).

Circular argument isn't it ; you are assuming what you are trying to
demonstrate.

### error bands on averages (or, why averages are pretty useless)

Quote:
> This is a bit of a technical post, so drop out now if you are worried

> When we do any sort of statistical study (e.g. how many cars does it
> take to clog up the freeway on average?) the results (if they are done
> properly) include a margin for error.  For example, it takes an
> average of 784 cars to clog the freeway, with a error margin of 41
> cars either way.

Presumably this is 784 within a certain time, or a certain distance?

--snip--

Quote:
> Let's assume that a bowler has an 'actual' average of 30 runs per
> wicket.  Let's say that over his career he bowls 12000 balls and
> concedes one run every two balls.  He would concede 6000 runs in his
> career.  If his bowling truly reflected his average then he would take
> 200 wickets.

> An average of 30 means that on any ball he has a probability of
> 200/12000 = 1/60 of taking a wicket.

> The error for the result from a simple probability distribution can be
> represented by the standard deviation, which is given by:

> s.d. = square root of p*(1-p)*n

> where p = probability of a wicket (1/60) and n = number of
> observations (12000).

> Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

> Now, we expect that 95% of all cases the actual result will fall
> within two standard deviations of the average result.

> So, the number of wickets this bowler will take in his career,
> assuming that his 'true' bowling average is 30, will be somewhere
> between 172 (200-28) and 228 (200+28).

> Now, this means that the bowler's career average will be somewhere
> between 26.3 and 34.9.

You've gone from a 95% probability to "will be".

Your figures may be correct, but the probability of his average being at the extremes of this range
is much smaller than that of it being nearer 30.

If the probability of one bowler's average falling outside the range is 5% (2.5% at each end of the
range), then the probability of two bowlers of the same ability falling outside the range at
opposite ends is 0.025 squared = 0.0625%.

To get a range where this probability is 1%, you only need a probability of 1-(2*sqrt 0.01) = 80% of
one bowler being within it. What would this range be?
--
David North
Email to this address will be deleted as spam
Use usenetATlaneHYPHENfarm.fsnet.co.uk

### error bands on averages (or, why averages are pretty useless)

snip

Quote:

> If you have got this far then I am interested in reponses!

Yessss. Lies, damned lies and cricket statistics.
You'd have to be in urgent need of psychiatric treatment to regard
stats V Zimb C as the equal of stats V Australia.

### error bands on averages (or, why averages are pretty useless)

Quote:

> > An average of 30 means that on any ball he has a probability of
> > 200/12000 = 1/60 of taking a wicket.

> > The error for the result from a simple probability distribution can
> be
> > represented by the standard deviation, which is given by:

> > s.d. = square root of p*(1-p)*n

> > where p = probability of a wicket (1/60) and n = number of
> > observations (12000).

> > Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

Now without actually checking, i think this will be a Poisson
distribution or some derivative thereof?  Which means the distribution
will be slightly skewed.  How much that matters for these kinds of
numbers, i do not know.

Quote:
> > Now, we expect that 95% of all cases the actual result will fall
> > within two standard deviations of the average result.

> Why 95% and why two? I'd expect that in > 95% of all cases, the actual
> result will fall within < 2 s.d.'s of the average.

With a gaussian distribution, 95% of cases will fall within two
standard deviations.  The number is not exactly 95% of course, but
close enough for all practical purposes, so that when scientists quote
95% confidence intervals and 2 sigma deviations, they mean the same
thing.  You can work it out easily enough.  68% of cases fall within 1
standard deviation of the average.

Quote:
> > So, the number of wickets this bowler will take in his career,
> > assuming that his 'true' bowling average is 30, will be somewhere
> > between 172 (200-28) and 228 (200+28).

> > Now, this means that the bowler's career average will be somewhere
> > between 26.3 and 34.9.  This is quite a wide range.

> I think if you change two s.d.'s to one, then you'll get a range from
> 28.x to 32.x, which is probably more like it.

Actually, 1sigma (68% confidence) is not a very meaningful number.
Any statistician will tell you that a flucutation of less than 2sigma
is usually not considered statistically significant.

Cheers, jonivar

--
|    jonivar skullerud    |    http://www.jonivar.skullerud.name/    |
----------------------------------------------------------------------
Where wages are high ... we shall always find the workmen more active,
diligent, and expeditious than where they are low ...  In reality high
profits tend much more to raise the price of work than high wages.
Adam Smith, Wealth of Nations

### error bands on averages (or, why averages are pretty useless)

Quote:

>> Now, we expect that 95% of all cases the actual result will fall
>> within two standard deviations of the average result.

>Why 95% and why two? I'd expect that in > 95% of all cases, the actual
>result will fall within < 2 s.d.'s of the average.

This comes from statistical theory. In something like this, the results
would be expected to conform to the so-called "Normal" (aka Gaussian)
distribution, which has this property. To be totally precise, 95% will
fall within plus or minus 1.96 s.d.s. Setting the criterion at 95% is
common in statistics, though rather arbitrary.
--
John Hall

"The beatings will continue until morale improves."
Attributed to the Commander of Japan's Submarine Forces in WW2

### error bands on averages (or, why averages are pretty useless)

Quote:

>--snip--

>> Let's assume that a bowler has an 'actual' average of 30 runs per
>> wicket.  Let's say that over his career he bowls 12000 balls and
>> concedes one run every two balls.  He would concede 6000 runs in his
>> career.  If his bowling truly reflected his average then he would take
>> 200 wickets.

>> An average of 30 means that on any ball he has a probability of
>> 200/12000 = 1/60 of taking a wicket.

>> The error for the result from a simple probability distribution can be
>> represented by the standard deviation, which is given by:

>> s.d. = square root of p*(1-p)*n

>> where p = probability of a wicket (1/60) and n = number of
>> observations (12000).

>> Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

>> Now, we expect that 95% of all cases the actual result will fall
>> within two standard deviations of the average result.

>> So, the number of wickets this bowler will take in his career,
>> assuming that his 'true' bowling average is 30, will be somewhere
>> between 172 (200-28) and 228 (200+28).

>> Now, this means that the bowler's career average will be somewhere
>> between 26.3 and 34.9.

>You've gone from a 95% probability to "will be".

>Your figures may be correct, but the probability of his average being
>at the extremes of this range
>is much smaller than that of it being nearer 30.

>If the probability of one bowler's average falling outside the range is
>5% (2.5% at each end of the
>range), then the probability of two bowlers of the same ability falling
>outside the range at
>opposite ends is 0.025 squared = 0.0625%.

>To get a range where this probability is 1%, you only need a
>probability of 1-(2*sqrt 0.01) = 80% of
>one bowler being within it. What would this range be?

I think that if we wanted to compare two bowlers, we should take as our
"null hypothesis" that they are equally as good as one another, i.e.
that the true probability of taking a wicket with each ball is the same
for both of them. Then we would apply either the "T Test" or the "Chi
Squared Test" (I did stats too long ago to recall which test is the more
appropriate) to see if their actual figures confirmed or refuted that
hypothesis at whatever "confidence level" we had decided to use
(probably 95%). We could then say that there is a 95% or greater
probability that the apparent difference between the bowlers is no more
than chance, or a 95% or greater probability that it is real, depending
on the result of the test.
--
John Hall

"The beatings will continue until morale improves."
Attributed to the Commander of Japan's Submarine Forces in WW2

### error bands on averages (or, why averages are pretty useless)

Quote:
>I think that if we wanted to compare two bowlers, we should take as our
>"null hypothesis" that they are equally as good as one another, i.e.
>that the true probability of taking a wicket with each ball is the same
>for both of them. Then we would apply either the "T Test" or the "Chi
>Squared Test" (I did stats too long ago to recall which test is the more
>appropriate) to see if their actual figures confirmed or refuted that
>hypothesis at whatever "confidence level" we had decided to use
>(probably 95%). We could then say that there is a 95% or greater
>probability that the apparent difference between the bowlers is no more
>than chance, or a 95% or greater probability that it is real, depending
>on the result of the test.

Sorry. The first half of that last sentence is wrong. Here's a revised
version: "Depending on the result of the Test, we could then say whether
or not the difference between the bowlers' figures is 'significant' at
the 95% confidence level. If it is 'significant', that means that there
is a 95% or greater probability that the difference is genuine." Note
that, except in the most extreme of cases, there is no such thing as
total certainty in statistics.
--
John Hall

"The beatings will continue until morale improves."
Attributed to the Commander of Japan's Submarine Forces in WW2

### error bands on averages (or, why averages are pretty useless)

hmmm... interesting... i wonder how does this theory relate to a bowler who
is also a chucker?

Quote:
> This is a bit of a technical post, so drop out now if you are worried

> When we do any sort of statistical study (e.g. how many cars does it
> take to clog up the freeway on average?) the results (if they are done
> properly) include a margin for error.  For example, it takes an
> average of 784 cars to clog the freeway, with a error margin of 41
> cars either way.

> The 'error' can be interpreted in various ways.  One way is that the
> number of cars required to block the freeway can actually vary from
> day to day, depending on weather, size of cars, skill of drivers etc,
> so the error reflects the range of values you are likely to come
> across (strictly this error is different to the error on the average
> but you get the idea).

> Another way of interpreting the 'error' is to say that there really is
> an 'actual' average (if we watched the freeway for an infinite number
> of days) and the average we measure is only an approximation to this
> actual average.  That is, the 'true' average is probably somewhere
> between 743 cars (784-41) and 825 cars (784+41).

> So, when you have a limited set of observations of a particular thing
> the average of the observations is only ever an approximation to the
> 'true' average.  That is, even if we accept that batting and bowling
> averages are the best way of measuring skill, these averages are only
> ever approximations of the 'actual' skill of the player.

> The key question is:  how big is the 'error' on a batting or bowling
> average?  With a few simple assumptions I think we can get a pretty
> good idea.

> Let's assume that a bowler has an 'actual' average of 30 runs per
> wicket.  Let's say that over his career he bowls 12000 balls and
> concedes one run every two balls.  He would concede 6000 runs in his
> career.  If his bowling truly reflected his average then he would take
> 200 wickets.

> An average of 30 means that on any ball he has a probability of
> 200/12000 = 1/60 of taking a wicket.

> The error for the result from a simple probability distribution can be
> represented by the standard deviation, which is given by:

> s.d. = square root of p*(1-p)*n

> where p = probability of a wicket (1/60) and n = number of
> observations (12000).

> Therefore:  s.d. = sqrt of 1/60*59/60*12000 = 14

> Now, we expect that 95% of all cases the actual result will fall
> within two standard deviations of the average result.

> So, the number of wickets this bowler will take in his career,
> assuming that his 'true' bowling average is 30, will be somewhere
> between 172 (200-28) and 228 (200+28).

> Now, this means that the bowler's career average will be somewhere
> between 26.3 and 34.9.  This is quite a wide range.

> It is worth repeating in a different way:

> Two bowlers, both equally as skilled, could both take around 200
> wickets and have very different averages!

> Or to make an even stronger point:  there is virtually no
> justification to rate one bowler as better than another on the basis
> of a few points difference in average:  the 'error bands' on bowling
> averages are too high for them to be of any use.

> That is, raw bowling averages provide almost no justification for
> separating, say, Lillee (23.92), Holding (23.68), Miller (22.97),
> Lindwall (23.03) and Hadlee (22.33).

> I would go further and say that once you also take into account
> opposition, pitch standard, laws and other such stuff, bowling
> averages are pretty much useless as a comparison tool except if
> comparing, say, a 20 average with a 26+ average bowler.

> If you have got this far then I am interested in reponses!

> John Clark

### error bands on averages (or, why averages are pretty useless)

Quote:

> Let's assume that a bowler has an 'actual' average of 30 runs per
> wicket.  Let's say that over his career he bowls 12000 balls and
> concedes one run every two balls.  He would concede 6000 runs in his
> career.  If his bowling truly reflected his average then he would take
> 200 wickets.

Ermmm... if he bowled 2000 overs for 6000 runs and had an average of
30 then how it could come out any different? Average is 30, runs
conceded is 6000 he MUST have taken 200 wickets.

### error bands on averages (or, why averages are pretty useless)

Quote:

> I would go further and say that once you also take into account
> opposition, pitch standard, laws and other such stuff, bowling
> averages are pretty much useless as a comparison tool except if
> comparing, say, a 20 average with a 26+ average bowler.

Exactly right!  Comparing averages between different times is dangerous to
say the best!  Same for batting averages.... How can we compare players from
the 20's, 30's & 40's where they may have played on damp pitches against
test batters of today?   The mathetatics you have displayed have absolutely
no meaning at all (apart from seeming impressive!).... What I would suggest
is comparing apples with apples instead of crapping on about standard
deviations and the like!  Don Bradman averages 99, compared to the next best
in his era (around 45-50 give or take!) so therefore he was twice as good as
any other batter going around.... so today, we have the best in the world
batting at around 55 to 60...which would lead us to conclude that The Don
would be at least twice as good if he was playing now and averaging 110-120
per innings.

### error bands on averages (or, why averages are pretty useless)

<snip>

Quote:
> > Now, this means that the bowler's career average will be somewhere
> > between 26.3 and 34.9.

<snip>

Quote:
> Your figures may be correct, but the probability of his average being

at the extremes of this range
Quote:
> is much smaller than that of it being nearer 30.

> If the probability of one bowler's average falling outside the range

is 5% (2.5% at each end of the
Quote:
> range), then the probability of two bowlers of the same ability

falling outside the range at
Quote:
> opposite ends is 0.025 squared = 0.0625%.

> To get a range where this probability is 1%, you only need a

probability of 1-(2*sqrt 0.01) = 80% of

Quote:
> one bowler being within it. What would this range be?

I don't know, but I think this is the overall point I was trying to
make in my previous post too. The probability that two bowlers of equal
skill end up with 27.x and 33.x averages is very small. Given two
bowlers of average 27.x and 33.x, I'd assume that with very large
probability they were of different skill levels.

Essentially what you're saying is that with 99% probability bowlers of
equal skill will actually average between (say) 29 and 32. With
99.9375% probability, bowlers of equal skill will actually average
between between 26.3 and 34.9.

In other words, while john clark is arguing that it is possible that
two equally good bowlers average 27 and 33 respectively, you're arguing
that the probability of this being the case is 1% or less, since with
99% probability bowlers of equal skill will average between 29 and 32.
(If this is indeed the range.)

Am I right?

-Samarth.

Quote:
> --
> David North
> Email to this address will be deleted as spam
> Use usenetATlaneHYPHENfarm.fsnet.co.uk

### error bands on averages (or, why averages are pretty useless)

Quote:

> >I think that if we wanted to compare two bowlers, we should take as our
> >"null hypothesis" that they are equally as good as one another, i.e.
> >that the true probability of taking a wicket with each ball is the same
> >for both of them. Then we would apply either the "T Test" or the "Chi
> >Squared Test" (I did stats too long ago to recall which test is the more
> >appropriate) to see if their actual figures confirmed or refuted that
> >hypothesis at whatever "confidence level" we had decided to use
> >(probably 95%). We could then say that there is a 95% or greater
> >probability that the apparent difference between the bowlers is no more
> >than chance, or a 95% or greater probability that it is real, depending
> >on the result of the test.

> Sorry. The first half of that last sentence is wrong. Here's a revised
> version: "Depending on the result of the Test, we could then say whether
> or not the difference between the bowlers' figures is 'significant' at
> the 95% confidence level. If it is 'significant', that means that there
> is a 95% or greater probability that the difference is genuine." Note
> that, except in the most extreme of cases, there is no such thing as
> total certainty in statistics.

Yes, I agree entirely - some more 'formal' testing would be
appropriate.  My post was simply to raise the idea and throw around
some ballpark figures for the size of the 'errors' involved, mostly as
a corrective for those who seem to think that small differences in
career averages mean something.

The case which irritates me most is that Lillee's 'high' average of
23.92 seems to count against him when discussions of greatest bowlers
comes up and I think that it is completely irrelevant, when comparing
to other bowlers in the 20-26 range (ie almost all other 'great'
bowlers).

Someone emailed me directly and gave an nice alternative analysis
which saw it in terms of bowling spells, rather than individual balls
bowled.  He got error bands which were a bit smaller.

I think one could take the analysis much further, from a statistical
theory point of view, but there probably would not be much point in
having an analysis which was perfect from a probability theory
perspective but did not take into account all of the other bits like
opposition, pitches, laws etc.

John Clark

### error bands on averages (or, why averages are pretty useless)

Quote:

> > Let's assume that a bowler has an 'actual' average of 30 runs per
> > wicket.  Let's say that over his career he bowls 12000 balls and
> > concedes one run every two balls.  He would concede 6000 runs in his
> > career.  If his bowling truly reflected his average then he would take
> > 200 wickets.

> Ermmm... if he bowled 2000 overs for 6000 runs and had an average of
> 30 then how it could come out any different? Average is 30, runs
> conceded is 6000 he MUST have taken 200 wickets.

"'Actual' average" probably wasn't a good choice of phrase. What John means is that, based on the
bowler's ability, his average will tend towards 30, as the number of sixes thrown per six throws of
a die should tend towards one.
--
David North
Email to this address will be deleted as spam
Use usenetATlaneHYPHENfarm.fsnet.co.uk