The randomness of iTunes

In 1998, a rather awkward 25-year-old male walked into a CD store (this was in the day when music was sold on CDs, in stores, to 25 year-olds) and purchased Whitey Ford Sings the Blues by Everlast.  Here’s what the indubitable Wikipedia has to say about said album and artist

Whitey Ford Sings the Blues was both a commercial and critical success (selling more than 3 million copies).  It was hailed for its blend of rap with acoustic and electric guitars, developed by Everlast together with producers Dante Ross and John Gamble (aka SD50).  The album’s genre-crossing lead single “What It’s Like” proved to be his most popular and successful song, although the follow up single, “Ends”, also reached the rock top 10.

Several years later Apple launched iTunes, which also proved to be a commercial and critical success, and the awkward male promptly loaded Whitey Ford Sings the Blues into the song library.  iTunes seemed to take a particular shine to this album, apparently favouring it with many more frequent plays, when iTunes was set to “shuffle”, than any of other 100 or more albums in the collection.  At least that’s how it appeared to the awkward male, who seemed to notice it come up much more often than expected.

In a strange twist of fate I also just happen to have Whitey Ford Sings the Blues in my iTunes collection.  In another strange coincidence, just like that awkward male from a decade ago, I’ve noticed that iTunes tends to favour it over other albums in the song list when iTunes is set to shuffle.

Life is certainly full of strange coincidences, but does iTunes really favour certain songs/ artists/ albums over others?  Let’s test it scientifically…

I set iTunes to shuffle and counted the number of tracks I had to skip before I hit Whitey Ford Sings the Blues.  The results are below:

32, 65, 181, 67, 77, 152, 50, 46, 230, 64

In other words, Whitey Ford Sings the Blues played randomly 10 times in 964 attempts (i.e. 1.037% of the sample).  I have 119 albums in iTunes, so theoretically I should be hearing it 1/119=0.840% of the time.  So the sample is a little bit higher than expected, but statistically significantly higher?

This question can be answered using the probability mass function of the Binomial Distribution.  The probability of exactly 10 “successes” out of 964 “attempts”, given that the probability of a success is 1/119 is, using the very fine SpeedCrunch calculator:

binompmf(10; 964; 1/119) = 0.102 (i.e. 10.2%)

This is well above the standard p=0.05 (5%) significance level.  I have to conclude that Whitey Ford Sings the Blues doesn’t play any more or less frequently than any other album in my iTunes collection when the playlist is set to shuffle.

Humans are very bad a gauging randomness.  Or rather, probably like most predators, we’re very good at detecting patterns, and tend to see patterns when they’re not really there.  Luckily we have statistics to sort it all out for us.

And Whitey Ford Sings the Blues is still an awesome album.


Monte Carlo: The method, not the delicious biscuit

Bernard: It’s all waffle!  Nobody is prepared to admit that wine doesn’t have a taste.
Manny: But you can’t taste anything.  You smoke eighty bajillion cigarettes a day.  What’s that you’re eating?
Bernard: It’s some sort of delicious biscuit.
Manny: It’s a coaster!

– “Black Books” TV Series

Not the smoothest of segues, I admit, but I felt an urge to write about Monte Carlo methods.  This got me thinking about biscuits, which reminded me of that scene from the classic UK comedy series, Black Books, and, well, here we are…

The application of Monte Carlo methods is a very useful tool in the field of statistics.  The Monte Carlo method isn’t actually one particular method.  It instead describes any method involving the application of repeated sets of random numbers, within a set of pre-defined constraints, to produce an estimate which could not easily be derived analytically.  Monte Carlo methods have become especially easy to apply with the evolution of faster and more powerful computers.

As an example, imagine you have three datasets bound by the following constraints:

Dataset 1: n=7; min=2; median=12; max=18

Dataset 2: n=11; min=3; median=6; max=22

Dataset 3: n=9; min=1; median=8; max=16

dataset 1 dataset 2 dataset 3
obs 01
2 3 1
obs 02 a e m
obs 03 b f n
obs 04 12 g o
obs 05 c h 8
obs 06 d 6 p
obs 07 18 i q
obs 08 j r
obs 09 k 16
obs 10 l
obs 11 22

Knowing nothing else about these data could you make a guess at the mean and standard deviation of the three sets combined?  Yes you could, because you’ve been reading my blog and you know that this is a situation where Monte Carlo methods could be employed.  It would involve using a computer to replace letters a to r with randomly generated figures that fall within the required constraints.  For example, a and b must lie between the minimum of 2 and the median of 12.  Similarly p, q, r must lie between 8 and 16.  And so on.

One possible randomly generated dataset could look like the table below.  In this example I simply used the randbetween function in a spreadsheet to replace a to r with (pseudo) random numbers.

dataset 1 dataset 2 dataset 3
obs 01
2 3 1
obs 02 6 3 4
obs 03 11 3 5
obs 04 12 4 6
obs 05 15 5 8
obs 06 17 6 9
obs 07 18 12 11
obs 08 16 15
obs 09 17 16
obs 10 19
obs 11 22

This particular iteration has a mean of 9.9 and a standard deviation of 6.2 for the three sets combined.  Replacing a to r again with another set of randomly generated figures might result in a different combined mean and standard deviation.  Out of interest I ran just three iterations resulting in simulated means of 9.9, 10.2 and 10.5 as well as simulated standard deviations of 6.2, 6.3 and 6.1.  The idea behind the Monte Carlo method is you would run it over and over again, perhaps thousands or even millions of iterations, aggregating the results until you felt that you were converging toward a common outcome.  In the simple example described above it would probably be enough to look at the mode of several hundred simulated means and standard deviations.

So that’s Monte Carlo methods in a nutshell.  A very general concept, but simple and powerful in its application.

OK, now I’m hungry.  Anyone have a good biscuit recipe?