Bernard: It’s all waffle! Nobody is prepared to admit that wine doesn’t have a taste.
Manny: But you can’t taste anything. You smoke eighty bajillion cigarettes a day. What’s that you’re eating?
Bernard: It’s some sort of delicious biscuit.
Manny: It’s a coaster!– “Black Books” TV Series
Not the smoothest of segues, I admit, but I felt an urge to write about Monte Carlo methods. This got me thinking about biscuits, which reminded me of that scene from the classic UK comedy series, Black Books, and, well, here we are…
The application of Monte Carlo methods is a very useful tool in the field of statistics. The Monte Carlo method isn’t actually one particular method. It instead describes any method involving the application of repeated sets of random numbers, within a set of pre-defined constraints, to produce an estimate which could not easily be derived analytically. Monte Carlo methods have become especially easy to apply with the evolution of faster and more powerful computers.
As an example, imagine you have three datasets bound by the following constraints:
Dataset 1: n=7; min=2; median=12; max=18
Dataset 2: n=11; min=3; median=6; max=22
Dataset 3: n=9; min=1; median=8; max=16
dataset 1 | dataset 2 | dataset 3 | |
obs 01 |
2 | 3 | 1 |
obs 02 | a | e | m |
obs 03 | b | f | n |
obs 04 | 12 | g | o |
obs 05 | c | h | 8 |
obs 06 | d | 6 | p |
obs 07 | 18 | i | q |
obs 08 | j | r | |
obs 09 | k | 16 | |
obs 10 | l | ||
obs 11 | 22 |
Knowing nothing else about these data could you make a guess at the mean and standard deviation of the three sets combined? Yes you could, because you’ve been reading my blog and you know that this is a situation where Monte Carlo methods could be employed. It would involve using a computer to replace letters a to r with randomly generated figures that fall within the required constraints. For example, a and b must lie between the minimum of 2 and the median of 12. Similarly p, q, r must lie between 8 and 16. And so on.
One possible randomly generated dataset could look like the table below. In this example I simply used the randbetween function in a spreadsheet to replace a to r with (pseudo) random numbers.
dataset 1 | dataset 2 | dataset 3 | |
obs 01 |
2 | 3 | 1 |
obs 02 | 6 | 3 | 4 |
obs 03 | 11 | 3 | 5 |
obs 04 | 12 | 4 | 6 |
obs 05 | 15 | 5 | 8 |
obs 06 | 17 | 6 | 9 |
obs 07 | 18 | 12 | 11 |
obs 08 | 16 | 15 | |
obs 09 | 17 | 16 | |
obs 10 | 19 | ||
obs 11 | 22 |
This particular iteration has a mean of 9.9 and a standard deviation of 6.2 for the three sets combined. Replacing a to r again with another set of randomly generated figures might result in a different combined mean and standard deviation. Out of interest I ran just three iterations resulting in simulated means of 9.9, 10.2 and 10.5 as well as simulated standard deviations of 6.2, 6.3 and 6.1. The idea behind the Monte Carlo method is you would run it over and over again, perhaps thousands or even millions of iterations, aggregating the results until you felt that you were converging toward a common outcome. In the simple example described above it would probably be enough to look at the mode of several hundred simulated means and standard deviations.
So that’s Monte Carlo methods in a nutshell. A very general concept, but simple and powerful in its application.
OK, now I’m hungry. Anyone have a good biscuit recipe?
Filed under: statistical concepts | Tagged: probability, randomness, statistics |
Is Monte Carlo a biscuit?