<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Bernoulli Trial</title>
	<atom:link href="http://thebernoullitrial.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thebernoullitrial.wordpress.com</link>
	<description>an experiment with only two possible outcomes... success or failure</description>
	<lastBuildDate>Thu, 17 Dec 2009 09:41:42 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='thebernoullitrial.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/e72fc6b2eda48fb89ca6a8512f893df3?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>The Bernoulli Trial</title>
		<link>http://thebernoullitrial.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://thebernoullitrial.wordpress.com/osd.xml" title="The Bernoulli Trial" />
		<item>
		<title>Some statistics in the global warming debate</title>
		<link>http://thebernoullitrial.wordpress.com/2009/12/13/some-statistics-in-the-global-warming-debate/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/12/13/some-statistics-in-the-global-warming-debate/#comments</comments>
		<pubDate>Sun, 13 Dec 2009 05:42:50 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[statistical concepts]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[climate change]]></category>
		<category><![CDATA[global warming]]></category>
		<category><![CDATA[monte carlo]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1252</guid>
		<description><![CDATA[UPDATED: 16 DEC 2009
we can&#8217;t account for the lack of warming at the moment and it is a travesty that we can&#8217;t
- US climate change scientist Kevin Trenberth, whose private emails are included in thousands of documents stolen by hackers and posted online
If you&#8217;re interested in statistics then I highly recommend that you add the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1252&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>UPDATED: 16 DEC 2009</p>
<blockquote><p>we can&#8217;t account for the lack of warming at the moment and it is a travesty that we can&#8217;t</p></blockquote>
<p style="padding-left:60px;">- <em>US climate change scientist Kevin Trenberth, whose private emails are included in thousands of documents stolen by hackers and posted online</em></p>
<p>If you&#8217;re interested in statistics then I highly recommend that you add the <a href="http://statsinthewild.wordpress.com/" target="_blank">Stats in the Wild</a> blog to your daily must-read list, and if you&#8217;re interested in climate change science then I highly recommend you check out SitW&#8217;s <a href="http://statsinthewild.wordpress.com/2009/11/24/short-term-global-warming-in-the-wild/" target="_blank">recent post</a> on the topic.  In fact, go and read it now and I&#8217;ll wait here until you get back&#8230;</p>
<p>Really good stuff, isn&#8217;t it?</p>
<p>Now I&#8217;m in broad agreement with Stats in the Wild on this one.  While it appears there may be a few rogue elements amongst the climatologist community (which obviously I don&#8217;t condone, but every family has its dodgy Uncle), overall the science is still sound.  I&#8217;ve nailed my colours firmly to the mast of Anthropogenic (i.e. human-caused) Global Warming (AGW).  Personally I think that to deny AGW is to deny the power of academic peer review, a system that has so successfully underpinned the scientific method for many centuries.  Yes, Earth has gone through significant periods of warming and cooling in the past, but the on-average increasing temperatures observed over the <em>last</em> 150 years or so, and particularly the <em>rate</em> of change, is <em>almost certainly</em> a result of increased greenhouse gases (carbon dioxide and methane, particularly) pouring into the atmosphere in ever greater quantities as human populations, industry and agriculture expand.</p>
<p>On the issue of recent <em>lack</em> of warming that has caused Kevin Trenberth such disquiet, Stats in the Wild points out the simple statistical explanation:</p>
<blockquote><p>You can have a system that is, on the average increasing over the long term, while still observing very flat or even declining trends when we know the overall system is increasing. That doesn’t mean that the system isn’t increasing, it just means we’ve seen one realization of the random system that hasn’t increased entirely by chance.</p></blockquote>
<p>Stats in the Wild illustrates the point with repeated computer simulations which statisticians sometimes refer to as a <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method" target="_blank">Monte Carlo method</a>. If you don&#8217;t have access to the &#8220;R&#8221; statistical analysis package you can produce a simplified version yourself using any spreadsheet.  The parameters in the example below are made up to illustrate a point.</p>
<p>Let&#8217;s imagine that the average temperature of the Earth has increased from 15C to 17C by a precise, constant amount every year over the last 100 years.  Scientists&#8217; ability to measure the average global temperature at any point in time is limited by sampling errors, instrument accuracy, and other sources of variability.  Despite these shortcomings, assume that the scientists&#8217; estimate is pretty good &#8211; consistently within +/-3% of the true value.  So the sources of measurement variability result in estimates that bounce around randomly, but always within a 3% margin of error band around the true value.  Graphically, the situation would look a bit like this:</p>
<p style="text-align:center;"><a href="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-1.png"><img class="size-medium wp-image-1280 aligncenter" title="global-warming-1" src="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-1.png?w=300&#038;h=195" alt="" width="300" height="195" /></a></p>
<p>The grey line in the graph above is the scientists&#8217; observations/estimates randomly distributed around the true value (yellow line) but between the margins of error (red lines).  Now if we emphasise the estimated values you see that, to the scientists, who can&#8217;t know the <em>true</em> value, the situation looks like this:</p>
<p><a href="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-2.png"><img class="aligncenter size-medium wp-image-1281" title="global-warming-2" src="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-2.png?w=300&#038;h=195" alt="" width="300" height="195" /></a></p>
<p>From the graph above you can see there are periods within the time series (orange lines) where random variation makes the trend <em>appear</em> to level off or even decrease.  As Stats in the Wild concludes, you <em>can</em> have a system that is, on the average increasing over the long term, while still observing very flat or even declining trends when we know the overall system is increasing.  It&#8217;s to be expected.</p>
<p>Selected further reading:</p>
<p><a href="http://bravenewclimate.com/2009/04/23/ian-plimer-heaven-and-earth/" target="_blank">Brave New Climate</a></p>
<p><a href="http://www.nature.com/nature/journal/v462/n7273/full/462545a.html" target="_blank">Climatologists under pressure</a></p>
<p><a href="http://www.nature.com/nature/journal/v462/n7273/full/462545a.html" target="_blank"></a><a href="http://www.economist.com/blogs/democracyinamerica/2009/12/trust_scientists" target="_blank">Scepticism&#8217;s limits</a></p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1252/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1252&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/12/13/some-statistics-in-the-global-warming-debate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-1.png?w=300" medium="image">
			<media:title type="html">global-warming-1</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/12/global-warming-2.png?w=300" medium="image">
			<media:title type="html">global-warming-2</media:title>
		</media:content>
	</item>
		<item>
		<title>Sex, Lies and Polygraph Tests</title>
		<link>http://thebernoullitrial.wordpress.com/2009/11/24/sex-lies-and-polygraph-tests/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/11/24/sex-lies-and-polygraph-tests/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 11:04:56 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[probability]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[lie detector]]></category>
		<category><![CDATA[polygraph]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1238</guid>
		<description><![CDATA[The Premier of South Australia, Mike Rann, is embroiled in something of a sex scandal at the moment.  I won&#8217;t go into all the sordid details, but, briefly, a woman by the name of Michelle Chantelois is claiming that Rann had sex with her several years ago (he was single but she was married at [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1238&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The Premier of South Australia, Mike Rann, is embroiled in something of a sex scandal at the moment.  I won&#8217;t go into all the sordid details, but, briefly, a woman by the name of Michelle Chantelois is claiming that Rann had sex with her several years ago (he was single but she was married at the time), a claim he is emphatically denying.  The political vultures are circling because, given the denial, if a &#8220;smoking gun&#8221; (or perhaps more aptly a &#8220;blue dress&#8221;) <em>is </em>produced then Rann is cactus.</p>
<p>Anyway, none of this is really interesting to me.  Actually, it&#8217;s hardly <em>anybody&#8217;s </em>business.  The shenanigans these two people did, or did not, get up to in their private lives years ago is entirely between them and their families as far as I&#8217;m concerned.  But today Chantelois has come out and volunteered to take a lie detector test to determine who is telling the truth.  So I wonder, statistically what&#8217;s the probability that she will pass the test?</p>
<p>To answer this question we&#8217;ll use <a href="http://en.wikipedia.org/wiki/Baye%27s_Theorem" target="_blank">Bayes&#8217; Theorem</a> and some rather dodgy data points I picked up around the internet.  It&#8217;s on the internet, you see, so it must be true.</p>
<p>The first bit of information we need is the probability that Chantelois is actually telling the truth.  In a recent, and utterly meaningless straw poll conducted by the <a href="http://www.news.com.au/adelaidenow/story/0,22606,26382226-2682,00.html?from=public_rss" target="_blank">Adelaide Advertiser</a>, this probability is precisely 53%.</p>
<blockquote><p>Shortly before 3pm, 53 per cent of respondents to an AdelaideNow poll believed former Parliament House waitress Ms Chantelois was telling the truth about claims of a sexual relationship with Mr Rann, while 47 per cent believed the Premier&#8217;s rejection of the allegations.</p></blockquote>
<p>Therefore P(Chantelois is telling the truth) = P(T) = 0.53;</p>
<p>and P(Chantelois is not telling the truth) = P(N) = 1-P(T) = 0.47.</p>
<p>The next bit of information we need concerns the reliability of polygraph tests themselves.  Personally I&#8217;ve always been more than a little sceptical of the infernal things.  Polygraphs smell like voodoo science to me, and according to <a href="http://en.wikipedia.org/wiki/Polygraph#Validity" target="_blank">Wikipedia</a>,</p>
<blockquote><p>Polygraph testing has little credibility among scientists.<sup> </sup>Despite claims of 90-95% validity by polygraph advocates, critics maintain that rather than a &#8220;test&#8221;, the method amounts to an inherently unstandardizable interrogation technique whose accuracy cannot be established.  A 1997 survey of 421 psychologists estimated the test&#8217;s average accuracy at about 61%, a little better than chance.</p></blockquote>
<p>Therefore P(polygraph says you&#8217;re telling the truth, <em>given </em>that you&#8217;re telling the truth) = P(+|T) = 0.61; and</p>
<p>P(polygraph says you&#8217;re telling the truth, <em>given </em>that you&#8217;re lying) = P(+|N) = 1-P(+|T) = 0.39.</p>
<p>Now using Bayes&#8217; Theorem, we can calculate Chantelois&#8217; chance of evading the lie detector test.</p>
<p>P(Chantelois is <em>not</em> telling the truth, <em>given that the polygraph says she is</em>)</p>
<p>= P(N|+)</p>
<p>= P(+|N) x P(N) / P(+)</p>
<p>= [ P(+|N) x P(N) ] / [ P(+|T)xP(T) + P(+|N)xP(N) ]</p>
<p>= [ 0.39 x 0.47 ] / [ 0.61 x 0.53 + 0.39 x 0.47 ]</p>
<p>= 0.362 (i.e. 36.2%)</p>
<p>Too high to put any kind of faith in the results of the test.</p>
<p>The calculations above were all done with tongue planted firmly in cheek and are not to be taken seriously.  Whether it&#8217;s Rann or Chantelois really telling the truth I don&#8217;t know or care.  What <em>is </em>important is that Bayes&#8217; Theorem shows us that, even with accurate tests, there is a good chance of a misclassification.  A single test is usually not enough.</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1238/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1238/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1238/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1238/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1238/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1238/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1238/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1238/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1238/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1238/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1238&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/11/24/sex-lies-and-polygraph-tests/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>A thank you to my readers&#8230;</title>
		<link>http://thebernoullitrial.wordpress.com/2009/11/22/a-thank-you-to-my-readers/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/11/22/a-thank-you-to-my-readers/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 05:32:36 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[uncategorized]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1232</guid>
		<description><![CDATA[I was looking through my blog&#8217;s referrer stats the other day, and noticed an incoming link from the Open Laboratory 2009.  It seems someone has voted my post on How to Talk Back to a Statistic as being some of &#8220;the best writing on science blogs&#8221; during 2009; and it looks like the post will [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1232&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I was looking through my blog&#8217;s referrer stats the other day, and noticed an incoming link from the <a href="http://scienceblogs.com/clock/2009/11/the_open_laboratory_2009_-_one.php" target="_blank">Open Laboratory 2009</a>.  It seems someone has voted my post on <a href="http://thebernoullitrial.wordpress.com/2009/03/08/how-to-talk-back-to-a-statistic/" target="_blank">How to Talk Back to a Statistic</a> as being some of &#8220;the best writing on science blogs&#8221; during 2009; and it looks like the post will be immortalised in print.</p>
<p>I&#8217;m very flattered.  So whoever it was, and to everyone else who enjoys this blog, thank you!</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1232/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1232/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1232/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1232/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1232/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1232/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1232/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1232/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1232/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1232/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1232&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/11/22/a-thank-you-to-my-readers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>The randomness of iTunes</title>
		<link>http://thebernoullitrial.wordpress.com/2009/11/07/the-randomness-of-itunes/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/11/07/the-randomness-of-itunes/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 23:12:10 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[probability]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[statistical concepts]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Everlast]]></category>
		<category><![CDATA[iTunes]]></category>
		<category><![CDATA[randomness]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1210</guid>
		<description><![CDATA[In 1998, a rather awkward 25-year-old male walked into a CD store (this was in the day when music was sold on CDs, in stores, to 25 year-olds) and purchased Whitey Ford Sings the Blues by Everlast.  Here&#8217;s what the indubitable Wikipedia has to say about said album and artist&#8230;
Whitey Ford Sings the Blues was [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1210&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>In 1998, a rather awkward 25-year-old male walked into a CD store (this was in the day when music was sold on CDs, in stores, to 25 year-olds) and purchased <em>Whitey Ford Sings the Blues</em> by <em>Everlast</em>.  Here&#8217;s what the indubitable Wikipedia has to say about <a href="http://en.wikipedia.org/wiki/Whitey_ford_sings_the_blues" target="_blank">said album and artist</a>&#8230;</p>
<blockquote><p>Whitey Ford Sings the Blues was both a commercial and critical success (selling more than 3 million copies).  It was hailed for its blend of rap with acoustic and electric guitars, developed by Everlast together with producers Dante Ross and John Gamble (aka SD50).  The album&#8217;s genre-crossing lead single &#8220;What It&#8217;s Like&#8221; proved to be his most popular and successful song, although the follow up single, &#8220;Ends&#8221;, also reached the rock top 10.</p></blockquote>
<p>Several years later Apple launched iTunes, which also proved to be a commercial and critical success, and the awkward male promptly loaded Whitey Ford Sings the Blues into the song library.  iTunes seemed to take a particular shine to this album, apparently favouring it with many more frequent plays, when iTunes was set to &#8220;shuffle&#8221;, than any of other 100 or more albums in the collection.  At least that&#8217;s how it appeared to the awkward male, who seemed to notice it come up much more often than expected.</p>
<p>In a strange twist of fate I also just happen to have Whitey Ford Sings the Blues in <em>my </em>iTunes collection.  In another strange coincidence, just like that awkward male from a decade ago, I&#8217;ve noticed that iTunes tends to favour it over other albums in the song list when iTunes is set to shuffle.</p>
<p>Life is certainly full of strange coincidences, but does iTunes really favour certain songs/ artists/ albums over others?  Let&#8217;s test it scientifically&#8230;</p>
<p>I set iTunes to shuffle and counted the number of tracks I had to skip before I hit Whitey Ford Sings the Blues.  The results are below:</p>
<p>32, 65, 181, 67, 77, 152, 50, 46, 230, 64</p>
<p>In other words, Whitey Ford Sings the Blues played randomly 10 times in 964 attempts (i.e. 1.037% of the sample).  I have 119 albums in iTunes, so theoretically I should be hearing it 1/119=0.840% of the time.  So the sample is a little bit higher than expected, but statistically significantly higher?</p>
<p>This question can be answered using the probability mass function of the <a href="http://en.wikipedia.org/wiki/Binomial_distribution" target="_blank">Binomial Distribution</a>.  The probability of exactly 10 &#8220;successes&#8221; out of 964 &#8220;attempts&#8221;, given that the probability of a success is 1/119 is, using the very fine <a href="http://speedcrunch.org/" target="_blank">SpeedCrunch</a> calculator:</p>
<p><span style="color:#0000ff;">binompmf(10; 964; 1/119) = 0.102 (i.e. 10.2%)</span></p>
<p>This is well above the standard <em>p</em>=0.05 (5%) significance level.  I have to conclude that Whitey Ford Sings the Blues doesn&#8217;t play any more or less frequently than any other album in my iTunes collection when the playlist is set to shuffle.</p>
<p>Humans are very bad a gauging randomness.  Or rather, probably like most predators, we&#8217;re very good at detecting patterns, and tend to see patterns when they&#8217;re not really there.  Luckily we have statistics to sort it all out for us.</p>
<p>And Whitey Ford Sings the Blues is still an awesome album.</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1210/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1210/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1210/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1210/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1210/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1210&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/11/07/the-randomness-of-itunes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>Once, back in primary school&#8230;</title>
		<link>http://thebernoullitrial.wordpress.com/2009/10/22/once-back-in-primary-school/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/10/22/once-back-in-primary-school/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 04:23:09 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[philosophy]]></category>
		<category><![CDATA[Hulk Hogan]]></category>
		<category><![CDATA[William Golding]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1149</guid>
		<description><![CDATA[&#8230; my friends and I found ourselves in a blazing row over whether professional wrestling was real or fake.  The adherents insisted that the combat was actual (it looks convincing), whereas the sceptics (myself included) were adamant that it was all theatre (how could the wrestlers possibly survive the impacts?).  Of course, being boys, rather [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1149&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>&#8230; my friends and I found ourselves in a blazing row over whether professional wrestling was real or fake.  The adherents insisted that the combat was actual (it looks convincing), whereas the sceptics (myself included) were adamant that it was all theatre (how could the wrestlers possibly survive the impacts?).  Of course, being boys, rather than calmly debate the relative merits, or seek an objective reference, we chose a Lord of the Flies decision-making stratagem of tribal violence and intimidation.  We debated the point using the time-honoured and gentlemanly tradition of casting the most slanderous aspersions on the probity of each other&#8217;s mother, and performing professional wrestling moves on each other, until a side relented.  Ultimately, the adherents secured victory through sheer force of numbers.  Consensus via mobocracy.</p>
<p>It was a defining moment in my personal development.  I was made aware that other people see the world differently to me; that people will cling grudgingly to an irrational belief, no matter how much logic you throw at them.  In fact, some people <em>choose </em>to define themselves as a person <em>by </em>that belief, and can get quite angry, even violent, when that belief is challenged.</p>
<p>It got worse.  As I grew older I  was a shocked to realise my world was awash with these irrational beliefs and dominated by people enslaved to them.  And I&#8217;m not just talking about religion.  Everything from, &#8220;I&#8217;ve got a sure-fire gambling strategy&#8230;&#8221; to, &#8220;Oh, I won&#8217;t vaccinate my child because they do more harm than good&#8230;&#8221; kind of thinking.  I developed a kind of Cole Sear Sixth Sense&#8230; I began to see Stupid People everywhere.  Or rather, smart people who chose to be ignorant.</p>
<p>It sometimes saddens me, although perhaps I do understand it.  An irrational belief in something for which there is no evidence, such as a loving, protective God, or the infallibility of television entertainment, could be a necessary coping mechanism.  A coping mechanism we need to give us hope, and keep us going, in the face of a cold universe utterly indifferent to our existence.  But it&#8217;s a child-like faith in the magic benevolence of Santa Claus.  An illusion.  Can the human race can ever truly be free as long as we keep subjugating ourselves to such silly notions?</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1149/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1149&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/10/22/once-back-in-primary-school/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>3G mobile data speeds in Sydney on the iPhone</title>
		<link>http://thebernoullitrial.wordpress.com/2009/10/11/3g-mobile-data-speeds-in-sydney-on-the-iphone/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/10/11/3g-mobile-data-speeds-in-sydney-on-the-iphone/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 10:22:43 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[3G speeds]]></category>
		<category><![CDATA[Instat]]></category>
		<category><![CDATA[iPhone]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1134</guid>
		<description><![CDATA[If you&#8217;re interested in mobile Internet and the iPhone (and, really, who isn&#8217;t?), the Byteside ByteBlog has posted the results of its &#8220;Australian iPhone data test&#8221;.  The researchers measured 3G download speeds, upload speeds, and ping times, using the iPhone&#8217;s Speedtest.net application connected to four major mobile ISPs around Sydney.
We had concurrent access to four [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1134&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>If you&#8217;re interested in mobile Internet and the iPhone (and, really, who <em>isn&#8217;t</em>?), the Byteside ByteBlog has <a href="http://byteside.com/byteblog/2009/09/australian-iphone-data-test-which-network-is-best/" target="_blank">posted the results</a> of its &#8220;Australian iPhone data test&#8221;.  The researchers measured 3G download speeds, upload speeds, and ping times, using the iPhone&#8217;s Speedtest.net application connected to four major mobile ISPs around Sydney.</p>
<blockquote><p>We had concurrent access to four iPhone 3GS handsets, one on each of the four Australian networks — Telstra, Optus, Vodafone, and 3.  Travelling around the Sydney CBD and Sydney suburban areas, we ran close to 150 individual speed tests.  Tests ranged from Manly to Homebush, Annandale to North Sydney, and plenty in between.</p></blockquote>
<p>Byteside have kindly made the raw data from their study available for download.  To analyse the results in more detail, I thought it would be an interesting exercise to try out the <em>Instat</em> statistical analysis package.  Instat can be downloaded for free (for non-commercial use) from the <a href="http://www.rdg.ac.uk/ssc/software/instat/instat.html" target="_blank">Statistical Services Centre</a>.</p>
<p>Importing Byteside&#8217;s raw data (available as an Excel file) into Instat was fairly straight forward.  It just needed a bit of cleaning up.  To start proceedings, Instat&#8217;s <em>Summary Tables</em> was used to produce some descriptive statistics of download speeds.</p>
<p><strong>Table 1: Statistical summary of download speeds by ISP<br />
</strong></p>
<table style="text-align:left;" border="2" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td><strong>Mobile<br />
ISP</strong></td>
<td><strong>obs<br />
(no.)</strong></td>
<td style="text-align:right;"><strong>min<br />
(kbps)</strong></td>
<td style="text-align:right;"><strong>mean<br />
(kbps)</strong></td>
<td style="text-align:right;"><strong>median<br />
(kbps)</strong></td>
<td style="text-align:right;"><strong>max<br />
(kbps)</strong></td>
<td style="text-align:right;"><strong>st.dev<br />
(kbps)</strong></td>
</tr>
<tr>
<td>Optus</td>
<td>110</td>
<td style="text-align:right;">0</td>
<td style="text-align:right;">1637</td>
<td style="text-align:right;">1903</td>
<td style="text-align:right;">3654</td>
<td style="text-align:right;">1080</td>
</tr>
<tr>
<td>Telstra</td>
<td>149</td>
<td style="text-align:right;">0</td>
<td style="text-align:right;">2681</td>
<td style="text-align:right;">2416</td>
<td style="text-align:right;">6151</td>
<td style="text-align:right;">1554</td>
</tr>
<tr>
<td>Three</td>
<td>149</td>
<td style="text-align:right;">0</td>
<td style="text-align:right;">625</td>
<td style="text-align:right;">525</td>
<td style="text-align:right;">2290</td>
<td style="text-align:right;">471</td>
</tr>
<tr>
<td>Vodafone</td>
<td>151</td>
<td style="text-align:right;">0</td>
<td style="text-align:right;">1283</td>
<td style="text-align:right;">1125</td>
<td style="text-align:right;">3349</td>
<td style="text-align:right;">961</td>
</tr>
<tr>
<td><strong>TOTAL</strong></td>
<td><strong>559</strong></td>
<td style="text-align:right;"><strong>0</strong></td>
<td style="text-align:right;"><strong>1550</strong></td>
<td style="text-align:right;"><strong>1271</strong></td>
<td style="text-align:right;"><strong>6151</strong></td>
<td style="text-align:right;"><strong>1329</strong></td>
</tr>
</tbody>
</table>
<p>Telstra looks the clear winner in terms of average and maximum download speeds, followed by Optus, Vodafone and Three.  However, I don&#8217;t think a summary analysis can tell the whole story.  I wanted to have a look at the plots of the distributions and run a few tests.  Instat obliges with the first request by providing quite a comprehensive graphing capability.</p>
<p>Intstat&#8217;s boxplot feature seems to cover all the basics (there&#8217;s also scope for plenty of customisation).  Below is a comparison of download speeds between the four carriers.</p>
<p><strong>Figure 1: Boxplot of mobile download speeds by ISP<br />
</strong></p>
<p><a href="http://thebernoullitrial.files.wordpress.com/2009/10/3g_download_speeds_boxplot.jpg"><img class="alignleft size-full wp-image-1148" title="3G_download_speeds_boxplot" src="http://thebernoullitrial.files.wordpress.com/2009/10/3g_download_speeds_boxplot.jpg?w=468&#038;h=265" alt="3G_download_speeds_boxplot" width="468" height="265" /></a></p>
<p>Now let&#8217;s have a look at the histograms&#8230;</p>
<p><strong>Figure 2: Histogram of mobile download speeds by ISP</strong></p>
<table style="text-align:left;" border="2" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td><strong>Fig. 2a: Optus<br />
<a href="http://thebernoullitrial.files.wordpress.com/2009/10/optus_download_hist.jpg"><img class="alignleft size-thumbnail wp-image-1140" title="optus_download_hist" src="http://thebernoullitrial.files.wordpress.com/2009/10/optus_download_hist.jpg?w=150&#038;h=101" alt="optus_download_hist" width="150" height="101" /></a><br />
</strong></td>
<td><strong>Fig. 2b: Telstra<br />
<a href="http://thebernoullitrial.files.wordpress.com/2009/10/telstra_download_hist.jpg"><img class="alignleft size-thumbnail wp-image-1143" title="telstra_download_hist" src="http://thebernoullitrial.files.wordpress.com/2009/10/telstra_download_hist.jpg?w=150&#038;h=100" alt="telstra_download_hist" width="150" height="100" /></a><br />
</strong></td>
</tr>
<tr>
<td><strong>Fig. 2c: Three<br />
<a href="http://thebernoullitrial.files.wordpress.com/2009/10/three_download_hist.jpg"><img class="alignleft size-thumbnail wp-image-1141" title="three_download_hist" src="http://thebernoullitrial.files.wordpress.com/2009/10/three_download_hist.jpg?w=150&#038;h=98" alt="three_download_hist" width="150" height="98" /></a><br />
</strong></td>
<td><strong>Fig. 2d: Vodafone<br />
<a href="http://thebernoullitrial.files.wordpress.com/2009/10/vodafone_download_hist.jpg"><img class="alignleft size-thumbnail wp-image-1142" title="vodafone_download_hist" src="http://thebernoullitrial.files.wordpress.com/2009/10/vodafone_download_hist.jpg?w=150&#038;h=100" alt="vodafone_download_hist" width="150" height="100" /></a><br />
</strong></td>
</tr>
</tbody>
</table>
<p>Telstra really does own the opposition, but not consistently so.  In fact the most frequent download speed for &#8220;The Big T&#8221; was less than 2Mbps.  All four carriers recorded their fair share of quite cruddy speeds, and all four recorded at least one instance where a download attempt didn&#8217;t work <em>at all</em>.  The reliability of 3G mobile broadband in this country (or Sydney, at least) has some way to go, apparently.</p>
<p>I also ran a couple of statistical tests.  As the data aren&#8217;t normally distributed, or the ISP download speed variances equal, I chose the non-parametric tests that Instat offers.  Firstly, to ask a somewhat redundant question, does network make a difference to download speed?</p>
<p><span style="color:#0000ff;">Kruskal-Wallis Test</span></p>
<p><span style="color:#0000ff;">Sample   n   Median   Ave rank    z<br />
1    110  1902.50    305.0      1.81<br />
2    149  2416.00    401.2     10.70<br />
3    149   525.00    163.2    -10.30<br />
4    151  1125.00    257.4     -2.01</span></p>
<p><span style="color:#0000ff;">H = 167.42 (adjusted for ties) with 3  d.f<br />
Probability &gt; 167.42  = 0</span></p>
<p>When you ask an obvious question, expect an obvious answer, I suppose.  We can comprehensively reject the null hypothesis that all the medians are equal.  Yes, download speeds depend on the network that you&#8217;re on.</p>
<p>Looking at the histograms above, the Optus and Vodafone distributions share some similarities.  Is there a statistically significant difference between these two carriers?  Again, the answer looks fairly obvious before the test is even run, but what the hell&#8230;</p>
<p><span style="color:#0000ff;">Two-sample test for independent data</span></p>
<p><span style="color:#0000ff;">Sample   n   Median   Rank sum<br />
optus_d 110  1902.50  16047.0<br />
voda_d 151  1125.00  18144.0</span></p>
<p><span style="color:#0000ff;">Mann-Whitney U = 9942.00    U&#8217; = 6668.00<br />
Wilcoxon T     = 16047.00</span></p>
<p><span style="color:#0000ff;">On HO: Mean (for U) = 8305.00<br />
Mean (for T) = 14410.00<br />
s.d. = 602.09  (adjusted for tied ranks)</span></p>
<p><span style="color:#0000ff;">Hence z = 2.72      Significance level is 0.33% for one-sided test<br />
Significance level is 0.66% for two-sided test</span></p>
<p>0.33% (p=0.0033, one-sided) is highly statistically significant.  So we must conclude that the Optus mobile data network does indeed provide faster download speeds on the iPhone than Vodafone in the Sydney areas covered in Byteside&#8217;s study.  That&#8217;s when the download does actually work, of course.</p>
<p>So this post is really more a brief review of the Instat package than mobile network download speeds in Sydney.  I really enjoyed using it for the small amount of statistical analysis presented in this post.  Instat was capable and quite fun to drive.  Although Instat doesn&#8217;t have the polish or bells and whistles of a package like SPSS, it does have an intuitive GUI; a useful tool-set; and doesn&#8217;t come with  the steep learning curve of R.  Free for non-commercial use, the price is right too.</p>
<p>Finally, in a conclusion keeping with the overall theme of this post, Telstra is the obvious choice for mobile data carrier in terms of download speeds and reliability.  But you&#8217;ll certainly pay for the privilege.  Optus and Vodafone both represent good value alternatives.  Based on the results collected by Byteside, it&#8217;s hard to recommend Three.  It is also a particular concern that all four networks recorded nil or negligible speeds to a significant proportion of download attempts in the survey.</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1134/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1134/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1134/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1134&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/10/11/3g-mobile-data-speeds-in-sydney-on-the-iphone/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/10/3g_download_speeds_boxplot.jpg" medium="image">
			<media:title type="html">3G_download_speeds_boxplot</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/10/optus_download_hist.jpg?w=150" medium="image">
			<media:title type="html">optus_download_hist</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/10/telstra_download_hist.jpg?w=150" medium="image">
			<media:title type="html">telstra_download_hist</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/10/three_download_hist.jpg?w=150" medium="image">
			<media:title type="html">three_download_hist</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/10/vodafone_download_hist.jpg?w=150" medium="image">
			<media:title type="html">vodafone_download_hist</media:title>
		</media:content>
	</item>
		<item>
		<title>Benford&#8217;s Law and Census data</title>
		<link>http://thebernoullitrial.wordpress.com/2009/09/19/benfords-law-and-census-data/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/09/19/benfords-law-and-census-data/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 12:05:15 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[statistical concepts]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Benford's Law]]></category>
		<category><![CDATA[census]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=694</guid>
		<description><![CDATA[Would you like to conduct a simple statistical experiment?
Grab a list of measurements from a real-life source of data, such as heights of buildings, or lengths of rivers.  Now look at just the first digit of each of the measurements and record the frequency of the numbers 1 through to 9.  Intuitively, you&#8217;d expect that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=694&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Would you like to conduct a simple statistical experiment?</p>
<p>Grab a list of measurements from a real-life source of data, such as heights of buildings, or lengths of rivers.  Now look at just the <em>first </em>digit of each of the measurements and record the frequency of the numbers 1 through to 9.  Intuitively, you&#8217;d expect that each leading digit occurs roughly 11% of the time (i.e. 1 chance in 9).  However, it is an interesting observation that the leading digit from such sources is often <em>not</em> uniformly distributed.  Surprisingly, a first digit of 1 tends to appear with a probability of about 30%, a leading digit of 2 tends to occur about 18% of the time, a 3 about 13%, and so on in a <em>logarithmically decreasing</em> pattern, with a leading digit of 9 often being observed less than 5% of the time.</p>
<p>Welcome to the bizarre world* of <a href="http://mathworld.wolfram.com/BenfordsLaw.html" target="_blank">Benford&#8217;s Law</a> [*not actually a bizarre world].</p>
<p>This phenomenon was first noted by the astronomer and mathematician Simon Newcomb in 1881.  The physicist Frank Benford re-stated the observation in 1938 and, in an odd twist of fate, it is after him that the law is named.  It is referred to as a &#8220;Law&#8221; but of course it won&#8217;t apply to <em>all </em>kinds of real-world lists of numbers.  Lottery results (they&#8217;re entirely random) or counts of fingers <span style="color:#888888;">(since we&#8217;re talking about digits)</span>, by way of example, should be, I hope, somewhat more uniformly distributed.</p>
<p>I thought it would be a fun exercise to test Benford&#8217;s Law using population data available from the <a href="http://abs.gov.au/" target="_blank">Australian Bureau of Statistics&#8217; website</a>.  I downloaded the latest Census population counts across  the 129 Statistical Local Areas (SLAs) in South Australia, and tallied the frequency of the leading digits.  The results of observed versus expected frequencies are summarised in Table 1 below.  For example, there were 3 SLAs that recorded a population with a leading digit of 9 (they were 934, 9205, and 9015).  If Benford&#8217;s Law holds there should be 6 such SLAs.  Note that one SLA had a population of zero, so was not included.</p>
<p><strong>Table 1: Frequency of leading digit &#8211; observed vs. expected under Benford&#8217;s Law<br />
</strong></p>
<table style="text-align:left;" border="2" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td><strong>Leading digit<br />
</strong></td>
<td><strong>No. times leading digit<br />
was observed</strong></td>
<td style="text-align:right;"><strong>No. times leading digit<br />
was expected* </strong></td>
</tr>
<tr>
<td>1</td>
<td>47 (36.7%)</td>
<td align="right">39 (30.1%)</td>
</tr>
<tr>
<td>2</td>
<td>36 (28.1%)</td>
<td align="right">23 (17.6%)</td>
</tr>
<tr>
<td>3</td>
<td>11 (8.6%)</td>
<td align="right">16 (12.5%)</td>
</tr>
<tr>
<td>4</td>
<td>10 (7.8%)</td>
<td align="right">12 (9.7%)</td>
</tr>
<tr>
<td>5</td>
<td>1 (0.8%)</td>
<td align="right">10 (7.9%)</td>
</tr>
<tr>
<td>6</td>
<td>6 (4.7%)</td>
<td align="right">9 (6.7%)</td>
</tr>
<tr>
<td>7</td>
<td>7 (5.5%)</td>
<td align="right">7 (5.8%)</td>
</tr>
<tr>
<td>8</td>
<td>7 (5.5%)</td>
<td align="right">7 (5.1%)</td>
</tr>
<tr>
<td>9</td>
<td>3 (2.3%)</td>
<td align="right">6 (4.6%)</td>
</tr>
<tr>
<td><strong>Total SLAs<br />
</strong></td>
<td><strong>128 (100%)</strong></td>
<td align="right"><strong>128 (100%)</strong></td>
</tr>
</tbody>
</table>
<p><em>*Expected numbers are rounded to nearest integer based on exact percentages predicted under Benford&#8217;s Law.</em></p>
<p>There are certainly some similarities between the two columns.  The observed leading digits 1 through to 9 of the SLA population counts do decrease in a logarithmic pattern, as Benford&#8217;s Law predicts.  However, the observed vs. expected frequencies by row are only really roughly comparable.  The observed frequencies appear to be a little bit more weighted towards the digits 1 (47 vs. 39) and 2 (36 vs. 23).  Also, the leading digit of 5 is a bit of a standout, with only one observed compared to ten expected.</p>
<p>Did the Census counts conform to Benford&#8217;s Law?  I wasn&#8217;t convinced, and decided to dig deeper with an appropriate statistical test.  Under the null hypothesis, both the observed and expected counts come from the same distribution (i.e. Benford-type logarithmic).  Under the alternative hypothesis (two-sided) they are different distributions.</p>
<p>Firing up the trusty statistical analysis packager &#8220;R&#8221;, I entered the above matrix and ran <a href="http://en.wikipedia.org/wiki/Fisher%27s_exact_test" target="_blank">Fisher&#8217;s Exact Test</a>:</p>
<p><span style="color:#0000ff;">&gt; census &lt;- matrix(c(47,36,11,10,1,6,7,7,3,39,23,16,12,10,9,7,7,6), nr=9)</span></p>
<p><span style="color:#0000ff;">&gt; fisher.test(census, workspace=2000000)</span></p>
<p>yielding the following output:</p>
<p><span style="color:#0000ff;">Fisher&#8217;s Exact Test for Count Data</span></p>
<p><span style="color:#0000ff;">data:  census<br />
p-value = 0.0791<br />
alternative hypothesis: two.sided</span></p>
<p>p=0.0791 is, I think, a bit on the low side, but technically not statistically significant at the standard p=0.05 level.  Therefore not enough evidence to reject the null hypothesis in favour of the alternative.  I had to conclude that the observed frequencies of Census populations by SLA in SA do, in fact, follow a Benford-type logarithmic distribution.  Benford&#8217;s Law analysis is often used for fraud detection (for example, insurance claims and even the recent Iranian Presidential election), so it is a relief to know that the Australian Bureau of Statistics isn&#8217;t just making their data up at random!</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/694/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/694/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/694/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/694/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/694/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/694/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/694/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/694/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/694/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/694/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=694&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/09/19/benfords-law-and-census-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>I&#8217;m number one, baby, so why try harder?</title>
		<link>http://thebernoullitrial.wordpress.com/2009/09/05/im-number-one-baby-so-why-try-harder/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/09/05/im-number-one-baby-so-why-try-harder/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 23:23:07 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[how to lie with statistics]]></category>
		<category><![CDATA[surveys]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1090</guid>
		<description><![CDATA[The other day Whirlpool Discussion Forums member &#8220;billabong&#8221; posted that ISP aaNet is running a promotion, and are using the flyer pictured below.  Readers will see that aaNet have sourced two customer satisfaction surveys (Whirlpool Australian Broadband Survey 2008 and Australian PC Authority Best ISP Award 2008), making it clear that aaNet finished number &#8220;1&#8243; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1090&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The other day Whirlpool Discussion Forums member &#8220;billabong&#8221; <a href="http://forums.whirlpool.net.au/forum-replies.cfm?t=1272304&amp;r=20579001#r20579001" target="_blank">posted</a> that ISP aaNet is running a promotion, and are using the flyer pictured below.  Readers will see that aaNet have sourced two customer satisfaction surveys (<a href="http://whirlpool.net.au/survey/2008/" target="_blank"><em>Whirlpool Australian Broadband Survey 2008</em></a> and <a href="http://www.pcauthority.com.au/Awards2008/BestISP.aspx" target="_blank"><em>Australian PC Authority Best ISP Award 2008</em></a>), making it clear that aaNet finished number &#8220;1&#8243; in those polls.</p>
<p>Or did they?</p>
<p>As <em>billabong </em>points out, on the question of &#8220;Would you recommend your ISP to other people?&#8221; in the Whirlpool survey, aaNet actually finished<strong> 7th</strong> overall with 88.4% &#8220;Yes&#8221;.  In the &#8220;Value for Money&#8221; category of the PC Authority survey, aaNet also finished <strong>7th </strong>with 4 out of 6 stars.  And they dropped to only 3 out of 6 stars in the &#8220;Overall&#8221; class.</p>
<p>So not quite &#8220;Number 1&#8243;.</p>
<p>I don&#8217;t actually have a real problem with aaNet cherry picking survey results, putting themselves in the best possible light for marketing purposes.  It&#8217;s just par for the course in advertising.  I expect all companies will do it to some degree.  Consumers should <em>always </em>be wary of the various shenanigans that go on when it comes to marketing departments and data.  That said, there can be still be a certain elegance about it.  The way aaNet have crudely slapped a blue &#8220;Number 1&#8243; ribbon over results that they actually finished 7th in leaves a sour taste in my mouth.</p>
<p>Poor form, aaNet.  Poor form.</p>
<p><a href="http://thebernoullitrial.files.wordpress.com/2009/09/aanetflyer.jpg"><img class="aligncenter size-full wp-image-1093" title="aaNetFlyer" src="http://thebernoullitrial.files.wordpress.com/2009/09/aanetflyer.jpg?w=468&#038;h=662" alt="aaNetFlyer" width="468" height="662" /></a></p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1090/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1090/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1090/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1090/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1090/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1090/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1090/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1090/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1090/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1090/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1090&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/09/05/im-number-one-baby-so-why-try-harder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>

		<media:content url="http://thebernoullitrial.files.wordpress.com/2009/09/aanetflyer.jpg" medium="image">
			<media:title type="html">aaNetFlyer</media:title>
		</media:content>
	</item>
		<item>
		<title>Statistics experts label ISP filtering trials unscientific</title>
		<link>http://thebernoullitrial.wordpress.com/2009/08/22/statistics-experts-label-isp-filtering-trials-unscientific/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/08/22/statistics-experts-label-isp-filtering-trials-unscientific/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 00:47:40 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[philosophy]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Cleanfeed]]></category>
		<category><![CDATA[Internet censorship]]></category>
		<category><![CDATA[Internet filtering]]></category>
		<category><![CDATA[Professor Robert (Bob) F. Ling legend]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1081</guid>
		<description><![CDATA[Earlier this year I added my $AUD0.02 to the debate around the Australian government&#8217;s ill-conceived, and, in fact, ludicrous plan to compulsorily censor the internet (under the Orwellian moniker of Cleanfeed).  My arguments against it  were more ethical/ philosophical/ common sense, objecting that Cleanfeed:

was not needed
was not wanted
will not work
has no mandate
will be too [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1081&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Earlier this year I <a href="http://thebernoullitrial.wordpress.com/2009/01/06/saying-no-to-internet-censorship-in-australia/" target="_blank">added my $AUD0.02</a> to the debate around the Australian government&#8217;s ill-conceived, and, in fact, ludicrous plan to compulsorily censor the internet (under the Orwellian moniker of <em>Cleanfeed</em>).  My arguments against it  were more ethical/ philosophical/ common sense, objecting that Cleanfeed:</p>
<ul>
<li>was not needed</li>
<li>was not wanted</li>
<li>will not work</li>
<li>has no mandate</li>
<li>will be too expensive</li>
<li>will break things</li>
<li>will not scale</li>
<li>was not transparent</li>
<li>was vulnerable to scope creep</li>
</ul>
<p>All pretty sound arguments if you ask me* [*nobody asked me].  Enough to drop the project in its embryonic stages you would have thought.  But no, Cleanfeed trials were launched and now maintain an unstable orbit around the planet Stupid.  Viz this recent article in <em>ARN</em>:</p>
<p><a href="http://www.arnnet.com.au/article/312845/statistics_experts_label_isp_filtering_trials_unscientific" target="_blank"><strong>Statistics experts label ISP filtering trials unscientific: Trials for mandatory filtering would never be accepted in an academic statistics journal</strong></a></p>
<blockquote><p>The Federal Government’s ISP filter trials lack proper methodology and are not representative, according to experts in statistics and testing from two of Australia’s leading universities.</p>
<p>The criticisms come after two of the nine ISPs participating revealed only 15 of their customers, which in one case was 1 per cent of the total, chose to have their Internet filtered.</p>
<p>The vast majority of ISPs also used an opt-in system that requires users wanting to be filtered to request it.</p>
<p>“I would not have confidence in any of the results they find because of the way the sample has been constructed,” expert in statistics and senior lecturer at the Queensland University of Technology, Dr Daniel Johnson, said.</p></blockquote>
<p>So not only does Cleanfeed fail on ethical grounds, it now fails on hard scientific grounds.  As one of my statistical mentors, the legendary Professor Robert (Bob) F. Ling says, &#8220;When you find yourself in a hole, stop digging.&#8221;</p>
<p>&#8212;&#8212;</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1081/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1081/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1081/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1081/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1081/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1081/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1081/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1081/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1081/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1081/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1081&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/08/22/statistics-experts-label-isp-filtering-trials-unscientific/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
		<item>
		<title>The &#8220;V-shaped recovery&#8221; continues</title>
		<link>http://thebernoullitrial.wordpress.com/2009/08/16/the-v-shaped-recovery-continues/</link>
		<comments>http://thebernoullitrial.wordpress.com/2009/08/16/the-v-shaped-recovery-continues/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 11:30:06 +0000</pubDate>
		<dc:creator>Stanley</dc:creator>
				<category><![CDATA[research]]></category>
		<category><![CDATA[stock market]]></category>
		<category><![CDATA[Global Financial Crisis]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://thebernoullitrial.wordpress.com/?p=1057</guid>
		<description><![CDATA[&#8230;or is it more like a bullet ricocheting off the floor, about to hit us in the cojones?
Sixteen days late, but tonight I finally got around to updating my &#8220;Stock Market Seismometer&#8221; (see separate tab above) for the period ending July 2009.  It was a relief to see the control chart has moved out of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1057&subd=thebernoullitrial&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>&#8230;or is it more like a bullet ricocheting off the floor, about to hit us in the cojones?</p>
<p>Sixteen days late, but tonight I finally got around to updating my &#8220;Stock Market Seismometer&#8221; (see separate tab above) for the period ending July 2009.  It was a relief to see the control chart has moved out of &#8220;extremely over-sold&#8221; territory and into the &#8220;highly over-sold&#8221; range.  I had to go to all the trouble of changing the font colour from red to orange!  The index is now back to where it was in October last year, and continues its inexorable march upwards.</p>
<p>&#8212;&#8212;</p>
<p>NOT FINANCIAL ADVICE.   For academic interest only.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/thebernoullitrial.wordpress.com/1057/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/thebernoullitrial.wordpress.com/1057/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/thebernoullitrial.wordpress.com/1057/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/thebernoullitrial.wordpress.com/1057/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/thebernoullitrial.wordpress.com/1057/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/thebernoullitrial.wordpress.com/1057/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/thebernoullitrial.wordpress.com/1057/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/thebernoullitrial.wordpress.com/1057/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/thebernoullitrial.wordpress.com/1057/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/thebernoullitrial.wordpress.com/1057/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=thebernoullitrial.wordpress.com&blog=4583299&post=1057&subd=thebernoullitrial&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://thebernoullitrial.wordpress.com/2009/08/16/the-v-shaped-recovery-continues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b21bd421228cc7d8804e3100a7a7bd90?s=96&#38;d=identicon" medium="image">
			<media:title type="html">Stan</media:title>
		</media:content>
	</item>
	</channel>
</rss>