Earlier today I was enjoying Andrew Charlton’s Probabilistic Thinking in SEO post when I came across something that puzzled me.
Andrew is describing how to combine a probability range (e.g. “we think this will be somewhere between 5% and 60%”) into a single probability. He uses the geometric mean for this:
You can convert your bounds into a point estimate. Taking ‘the average’ will likely be well off, but you can use something called the geometric mean instead. The geometric mean is better suited to data with outliers or extreme variance.
I couldn’t figure out why it was best to use the geometric mean rather than the arithmetic (normal) mean. So I pinged him a direct message to ask.
Andrew kindly explained to me that the geometric mean is useful when you are more interested in ending up in the middle of something that spans several orders of magnitude. For example the arithmetic mean of 1 and 1 million is close enough to 500,000 which is one order of magnitude smaller than 1 million but five orders of magnitude bigger than 1. So in one sense this average is “closer” to 1 million than it is to one.
The geometric mean of these two numbers is sqrt(1*1000000)=1000
which is in
the middle in terms of orders of magnitude. And since Andrew is using these
numbers to make a Fermi estimate where he aims to be within an order of
magnitude of the true value this approach makes perfect sense.
This tweet is where it finally clicked for me:
I see the order of magnitude estimate as "really" an estimate of the log, so GM of estimates makes sense as it corresponds to AM of logs. Still trying to justify the AGM though :-)
— Robert Low (@RobJLow) November 13, 2017
But this got me thinking about probabilities and how we average them. Orders of magnitude are a bit different for probabilities because all probabilities must be between zero and one. It is very similar at the low end where 0.1, 0.01, 0.001 and 0.0001 are all different orders of magnitude but the same is true at the high end too when you look at 0.99, 0.999 and 0.9999.
Consider a service with 99% uptime; it can be down for nearly four days per year.
With 99.9% there must be less than 9 hours downtime per year.
99.99% requires less than 1 hour and a 99.999% SLA requires a maximum of slightly more than 5 minutes of downtime per year.
Normally when dealing with probabilities this kind of problem is dealt with by using a logit transform. We can use this to calculate the “logistic mean” in the same way that a log transform gives us the geometric mean.
The calculation looks like this:
Here is the “logistic mean” of 0.9 and 0.999 calculated on Wolfram Alpha. The answer is 0.9896 which is almost exactly 0.99; the middle “order of magnitude” between the inputs.
I haven’t seen this written about anywhere else; probably because I’m not putting the right words for it into Google. But just in case this is something new I thought I’d document it here.
[Andrew has a cool looking SEO Forecasting course that you should have a look at]