Some statistics in the global warming debate

we can’t account for the lack of warming at the moment and it is a travesty that we can’t

US climate change scientist Kevin Trenberth, whose private emails are included in thousands of documents stolen by hackers and posted online

If you’re interested in statistics then I highly recommend that you add the Stats in the Wild blog to your daily must-read list, and if you’re interested in climate change science then I highly recommend you check out the recent post on the topic.  In fact, go and read it now and I’ll wait here until you get back…

Really good stuff, isn’t it?

Now I’m in broad agreement with Stats in the Wild on this one.  While it appears there may be a few rogue elements amongst the climatologist community (which obviously I don’t condone, but every family has its dodgy Uncle), overall the science is still sound. I’ve nailed my colours firmly to the mast of Anthropogenic (i.e. human-caused) Global Warming (AGW).  Personally I think that to deny AGW is to deny the power of academic peer review, a system that has so successfully underpinned the scientific method for many centuries.  Yes, Earth has gone through significant periods of warming and cooling in the past, but the on-average increasing temperatures observed over the last 150 years or so, and particularly the rate of change, is almost certainly a result of increased greenhouse gases (carbon dioxide and methane, particularly) pouring into the atmosphere in ever greater quantities as human populations, industry and agriculture expand.

On the issue of recent lack of warming that has caused Kevin Trenberth such disquiet, Stats in the Wild points out the simple statistical explanation:

You can have a system that is, on the average increasing over the long term, while still observing very flat or even declining trends when we know the overall system is increasing. That doesn’t mean that the system isn’t increasing, it just means we’ve seen one realization of the random system that hasn’t increased entirely by chance.

Stats in the Wild illustrates the point with repeated computer simulations.  If you don’t have access to the “R” statistical analysis package you can produce a simplified version yourself using any spreadsheet.  The parameters in the example below are made up to illustrate a point.

Imagine that the average temperature of the Earth has increased from 15°C to 17°C by a precise, constant amount every year over a 100 year period.  Scientists’ ability to measure this temperature at any point in time is limited by sampling errors, instrument accuracy, and other sources of variability outside of their control.  Despite these shortcomings, assume that the scientists’ estimate is always pretty good – consistently within +/-3% of the true value.  So the sources of measurement variability result in estimates that bounce around randomly, but always within this 3% margin of error around the true value.  Graphically what I’m talking about might look a bit like this:

The grey line in the graph above is the scientists’ estimated temperature randomly distributed around the true value (yellow line) but between the margins of error (red lines).  Now if I emphasise the observed temperature you see that, to the scientists, whose measurement limitations mean they can never know the true value, the situation looks like this:

From the graph above you can see there are (crudely drawn lines in orange) intra-time series periods where random variation makes the trend appear to level off or even decrease.  If you were an observer at roughly year 65, for example, you’d be forgiven for thinking global warming was a right load of old cobblers, because it would appear that temperatures are stagnant, even getting cooler.  As Stats in the Wild concludes, you can have a system that is, on the average increasing over the long term, while still observing very flat or even declining trends when we know the overall system is increasing.

It’s to be expected.

——

Selected further reading:

Brave New Climate

Climatologists under pressure

Scepticism’s limits

——

3G mobile data speeds in Sydney on the iPhone

If you’re interested in mobile Internet and the iPhone (and, really, who isn’t?), the Byteside ByteBlog has posted the results of its “Australian iPhone data test”.  The researchers measured 3G download speeds, upload speeds, and ping times, using the iPhone’s Speedtest.net application connected to four major mobile ISPs around Sydney.

We had concurrent access to four iPhone 3GS handsets, one on each of the four Australian networks — Telstra, Optus, Vodafone, and 3.  Travelling around the Sydney CBD and Sydney suburban areas, we ran close to 150 individual speed tests.  Tests ranged from Manly to Homebush, Annandale to North Sydney, and plenty in between.

Byteside have kindly made the raw data from their study available for download.  To analyse the results in more detail, I thought it would be an interesting exercise to try out the Instat statistical analysis package.  Instat can be downloaded for free (for non-commercial use) from the Statistical Services Centre.

Importing Byteside’s raw data (available as an Excel file) into Instat was fairly straight forward.  It just needed a bit of cleaning up.  To start proceedings, Instat’s Summary Tables was used to produce some descriptive statistics of download speeds.

Table 1: Statistical summary of download speeds by ISP

Mobile
ISP
obs
(no.)
min
(kbps)
mean
(kbps)
median
(kbps)
max
(kbps)
st.dev
(kbps)
Optus 110 0 1637 1903 3654 1080
Telstra 149 0 2681 2416 6151 1554
Three 149 0 625 525 2290 471
Vodafone 151 0 1283 1125 3349 961
TOTAL 559 0 1550 1271 6151 1329

Telstra looks the clear winner in terms of average and maximum download speeds, followed by Optus, Vodafone and Three.  However, I don’t think a summary analysis can tell the whole story.  I wanted to have a look at the plots of the distributions and run a few tests.  Instat obliges with the first request by providing quite a comprehensive graphing capability.

Intstat’s boxplot feature seems to cover all the basics (there’s also scope for plenty of customisation).  Below is a comparison of download speeds between the four carriers.

Figure 1: Boxplot of mobile download speeds by ISP

3G_download_speeds_boxplot

Now let’s have a look at the histograms…

Figure 2: Histogram of mobile download speeds by ISP

Fig. 2a: Optus
optus_download_hist
Fig. 2b: Telstra
telstra_download_hist
Fig. 2c: Three
three_download_hist
Fig. 2d: Vodafone
vodafone_download_hist

Telstra really does own the opposition, but not consistently so.  In fact the most frequent download speed for “The Big T” was less than 2Mbps.  All four carriers recorded their fair share of quite cruddy speeds, and all four recorded at least one instance where a download attempt didn’t work at all.  The reliability of 3G mobile broadband in this country (or Sydney, at least) has some way to go, apparently.

I also ran a couple of statistical tests.  As the data aren’t normally distributed, or the ISP download speed variances equal, I chose the non-parametric tests that Instat offers.  Firstly, to ask a somewhat redundant question, does network make a difference to download speed?

Kruskal-Wallis Test

Sample   n   Median   Ave rank    z
1    110  1902.50    305.0      1.81
2    149  2416.00    401.2     10.70
3    149   525.00    163.2    -10.30
4    151  1125.00    257.4     -2.01

H = 167.42 (adjusted for ties) with 3  d.f
Probability > 167.42  = 0

When you ask an obvious question, expect an obvious answer, I suppose.  We can comprehensively reject the null hypothesis that all the medians are equal.  Yes, download speeds depend on the network that you’re on.

Looking at the histograms above, the Optus and Vodafone distributions share some similarities.  Is there a statistically significant difference between these two carriers?  Again, the answer looks fairly obvious before the test is even run, but what the hell…

Two-sample test for independent data

Sample   n   Median   Rank sum
optus_d 110  1902.50  16047.0
voda_d 151  1125.00  18144.0

Mann-Whitney U = 9942.00    U’ = 6668.00
Wilcoxon T     = 16047.00

On HO: Mean (for U) = 8305.00
Mean (for T) = 14410.00
s.d. = 602.09  (adjusted for tied ranks)

Hence z = 2.72      Significance level is 0.33% for one-sided test
Significance level is 0.66% for two-sided test

0.33% (p=0.0033, one-sided) is highly statistically significant.  So we must conclude that the Optus mobile data network does indeed provide faster download speeds on the iPhone than Vodafone in the Sydney areas covered in Byteside’s study.  That’s when the download does actually work, of course.

So this post is really more a brief review of the Instat package than mobile network download speeds in Sydney.  I really enjoyed using it for the small amount of statistical analysis presented in this post.  Instat was capable and quite fun to drive.  Although Instat doesn’t have the polish or bells and whistles of a package like SPSS, it does have an intuitive GUI; a useful tool-set; and doesn’t come with the steep learning curve of R.  Free for non-commercial use, the price is right too.

Finally, in a conclusion keeping with the overall theme of this post, Telstra is the obvious choice for mobile data carrier in terms of download speeds and reliability.  But you’ll certainly pay for the privilege.  Optus and Vodafone both represent good value alternatives.  Based on the results collected by Byteside, it’s hard to recommend Three.  It is also a particular concern that all four networks recorded nil or negligible speeds to a significant proportion of download attempts in the survey.

——

The “V-shaped recovery” continues

…or is it more like a bullet ricocheting off the floor, about to hit us in the cojones?

Sixteen days late, but tonight I finally got around to updating my “Stock Market Seismometer” (see separate tab above) for the period ending July 2009.  It was a relief to see the control chart has moved out of “extremely over-sold” territory and into the “highly over-sold” range.  I had to go to all the trouble of changing the font colour from red to orange!  The index is now back to where it was in October last year, and continues its inexorable march upwards.

——

NOT FINANCIAL ADVICE.  For academic interest only.

EasyData: It’s data, and it’s easy

I wanted to share with you a terrific online data enquiry tool recently posted by my State government called EasyDataEasyData is part of a wider business portal, but its particular focus is to present the latest regional South Australian economic, social and environmental indicators down to the Local Government (i.e. council) area.  Data sources include the Australian Bureau of Statistics and other agencies from all three levels of government.

I find the EasyData interface intuitive and easy to navigate; with plenty of relevant, useful and interesting information to explore.  And it looks great.  The functionality is also there for information analysts like me who might want to steal export the data into our own reports, presentations and spreadsheets.

EasyData is certainly one of the better, if not the best, online regional profiles products that I’ve come across, and I’ve seen a few.  All the more remarkable given that, I understand, just a few key staff developed the whole thing in only a few months.  It’s a credit to the those that put it all together (not me!).  Great effort.

Check out EasyData here: www.SouthAustralia.biz/EasyData

——

Can I get a Little MORE support around here?

In November last year, I blogged about the phone queue reporting and graphing page beta-released by my ISP, Internode.  The aim was to use the data presented on that page, with some basic queuing theory (Little’s Law), to determine the size of their helpdesk.  I theorised that a rough estimate for how many Internode support staff are on duty at any particular point in time could be given by:

Calls in Queue x 12.5 / Wait Time

Looking at the hourly averages, I concluded that, on the Saturday of my analysis, Internode helpdesk had 8 or so people on hand to assist with customers’ technical problems.  I have been informed that my estimate for that period was surprisingly accurate.

The graphs and hourly averages data were taken offline for a little bit, but they’ve recently been reinstated.  I thought it would be timely and interesting to have another look and see what’s changed over the intervening months.  Last Saturday evening I went through and analysed the hourly averages covering the time period from 8pm Friday (17 July 2009) to 8pm Saturday (18 July 2009).  Note that Internode’s residential technical support helpdesk is staffed from 7am to midnight, 7 days a week.  I then applied the same methodology from Can I get a Little support around here to estimate the number of support staff on duty (last column).

Table 1: Internode helpdesk phone queue – hourly averages

Time period Avg. wait time
(mins:secs)
Avg. calls queued
(no.)
Support staff on duty
(est.)
Friday, 8pm-9pm 00:21 0.1 4
9pm-10pm 00:22 0.1 3
10pm-11pm 00:21 0.0 not enough data
11pm-midnight 00:22 0.0 not enough data
Saturday, 7am-8am 00:22 0.1 3
8am-9am 00:30 0.2 5
9am-10am 00:22 0.2 7
10am-11am 00:26 0.3 9
11am-noon 00:22 0.2 7
noon-1pm 00:22 0.2 7
1pm-2pm 00:23 0.2 7
2pm-3pm 00:23 0.2 7
3pm-4pm 00:58 0.7 9
4pm-5pm 00:22 0.2 7
5pm-6pm 00:23 0.2 7
6pm-7pm 00:22 0.1 3
7pm-8pm 00:21 0.1 4

Looking through my small window of analysis, it appears that Internode have largely resolved any problems they were experiencing late last year/early this year in terms of extraordinarily long wait times.  Time spent in the phone queue has collapsed from around 10 minutes to less than 30 seconds.  However, this dramatic improvement doesn’t appear to be due to any significant increase in staff numbers.

——

The poll as a political weapon

When applied correctly, statistics is an elegant tool that can help us put a random and uncertain world into context.  When abused, it can help dark and mysterious powers further their own nefarious agendas.  In its most brutal form, statistics can be used as a weapon to club the thick-witted over the head.

No game is quite so brutal as politics.

It’s been an interesting few weeks of local politics here in South Australia.  South Australia has a fixed 4-year electoral cycle and our next election is due in March 2010.  The Liberal opposition party isn’t making any real headway against the Labor incumbents who dominate the political landscape.  To make matters worse, the State Liberal leader, Martin Hamilton-Smith (MHS), recently entangled himself in a “dodgy documents” scandal.  He tried to embarrass the government with some “leaked” emails that turned out to be forgeries.  If the Liberal party are to present any kind of alternative government to the people they need to quickly put this controversy behind them; present a united front; and build positive momentum over the remaining nine months leading into the formal campaign.  Leadership rumblings in the face of a looming election would be too much of a distraction.  In short, MHS’ position became untenable.  However, a trigger was needed to effect his removal.  That trigger was, of all things, a little statistic.

Last week, Mike Rann, the current Labor Premier of South Australia fired a warning shot of what was to come on the social networking site, twitter:

Some of the polling to be dribbled out over next few weeks will be of dubious provenance but Lib plotters hope it will spook/stampede MPs.

-Mike Rann, Premier of South Australia, on twitter, 5:43 PM, 27 June 2009

Sure enough, the very next day, polling of dubious provenance dribbled out of our local propaganda rag, the Sunday Mail.

A Sunday Mail poll of 483 Adelaide metropolitan voters put Labor on 64 per cent to the Liberals’ 36 percent on a two party-preferred basis.

-AdelaideNow, 28 June 2009

Bear in mind this “poll” had less than 500 respondents.  Even if everything had been above board, that would put the margin of error at about 4-5 percentage points or so.  Reputable polling companies typically canvass a greater sample size (usually about 1200) to reduce the margin of error.  However the poll contained multiple flaws.  For a start, it covered metropolitan Adelaide only, not the whole State.  Further, the Sunday Mail didn’t bother to explain which electorates were included, how the polling was conducted, or by whom.  They couldn’t even be bothered to present the results in a tabled summary for detailed scrutiny.  It’s what Darrell Huff would have described as a phoney statistic.  It’s what I would describe as horse shit.  The poll was biased, politically motivated and compromised from the outset.

But it had its desired result.  There was a leadership spill.  Just as the “Lib plotters” had hoped, the MPs were spooked by the dodgy poll.  Support for MHS melted away.  Although he scraped back in, by the narrowest of margins (11-10, with one abstaining), it was a Pyrrhic victory.  Realising that the margin was too slender to lead effectively, MHS promptly quit.  There’ll be another leadership ballot tomorrow which MHS won’t be contesting.  So that’s it, he’s gone.

All thanks to a little statistic.

Funnily enough, the powerbrokers have apparently decided that this whole dodgy polling strategy is a winner.  Whereas it can be used to tear one leader down, it can be used to build up the next.  Mike Rann went on to post on twitter:

After Lib leader settled we’ll see campaign by one media to promote whoever wins. We might see odd dodgy poll plus Press Club ‘vision’ etc

So not the last we’ll see of the poll as a political weapon.  Interesting times ahead.

——

Disclaimer: I am not associated with any political party.  I simply have a keen interest in statistics and its application in the world.

Is the worst of the GFC really behind us?

The decision by 10 US banks this week to repay their TARP money and escape the clutches of the Obama remuneration busybodies more or less confirms that the financial crisis is over.

Alan Kohler, “Pricing out the crisis”, Business Spectator, 11 June 2009

Alan Kohler is an Australian business journalist that I have a lot of time and respect for.  So it was interesting to read him confirm what my own crude Stock Market Seismometer has also been indicating over the last three months… perhaps, financially speaking, things are returning to a state of normality?  I’m not qualified to give financial advice, but I’m quietly optimistic that the worst of the Global Financial Crisis is behind us.

Now there’s only the swine flu pandemic currently sweeping Australia to worry about!

——