The Bristol Centre for Multilevel Modelling has released the beta version of Stat-JR. The software has two features that may make it attractive for teaching.

For those teaching advanced techniques the software can sit on top of other packages (such as SPSS or Stata) and use their features within a single command language (so that students do not need to learn whole new packages in order to execute new techniques).

However the (for me) really exciting bit of this software is the ‘ebook’ interface that it offers. It is possible to author ebooks with dynamic pages populated from datasets.The dynamic ebook page receives instructions from a reader which it then executes and posts results back to the page. This makes it a useful tool in building teaching materials, since the Stat-JR ebook can sit on top of spss or other applications. Interactive learning materials can be updated with later releases of data sets or releases of statistics software, or different examples suited to different discipline backgrounds, with much less effort needed to tailor materials to different audiences or take account of other changes.

From the CMLM message:

At present whilst the software is a beta release we are only distributing it in (renewable) 30-day limited licence form but it is our intention after a period to release fully when, as with our MLwiN software, Stat-JR will be free to UK academics with potentially a small one off fee to non-academics and non-UK users. Note that currently, as with MLwiN, Stat-JR is a Microsoft Windows only piece of software.
If you would like to test out the software and give us feedback then
for more details on the software, its documentation and how you can download it please visit http://www.bristol.ac.uk/cmm/research/estat/downloads/index.html and fill in a request form for a download.
Best wishes,

The Stat-JR team.

I find it useful to get students thinking abut quantitative evidence by examining how ignorant we are of the order of magnitude of numbers that nevertheless feature in highly visible public debates. Most probably have some idea that the incarceration rate is higher in the US than UK. But how many people does either jurisdiction lock up, and what ‘should’ the rate be?
PLenty of good data at


The US locks up more people per head than anywhere else, with an incarceration rate of 0.73%. It has 5% of the world’s population but about 25% of the world’s prisoners. With 2.27m it has almost as many as the combined total for Russia and China )

And which part of the UK has fewer prisoners per head than any other? Step forward… Northern Ireland with a rate of 99 per 100,000 pop, compared to 155 for Eng & Wales and 157 for Scotland.

Daniel Kahneman’s Thinking Fast and Slow has many relevant ideas for statistics teaching, regardless of how far one agrees or not with the details of ‘Prospect Theory’, behavioral economics or all of Kahneman’s arguments about the psychology of cognition.
The most important insight is his (to my mind convincing) experimental demonstrations of the manifold forms of the ‘What You See Is All There Is’ bias in cognition, together with a plausible account of its evolutionary origins. Quantitative, mostly statistical, evidence that goes beyond individual observation, together with effortful ‘System 2’ logical thought that follows axioms of probability or arithmetical calculation, are the only possible correctives to WYSIATI. This ought to be a stronger selling point for statistics. ‘Fear of Stats’ probably has an element of ‘discomfort of undermining cherished intuitions’ to it.
His account of regression to the mean and pilot instructors (picked up by Dilnot and Blastland in Tiger That Isnt) is probably a point to start with in introducing statistics. I also like the way he introduces the idea of correlation consequent to that of regression: reversing the order of most statistics texts.

I’m currently putting together a course on numerical and statistical literacy aimed at all undergraduates at Edinburgh. This is a challenge, not least because it means that I cannot assume any particular substantive academic discipline as a context. I’ve also been going round in a few circles over what to go into first. Start with randomness and then the idea of a variable distribution and then correlation? Or start with description and the nature of variables? Text for the course will be Dilnot and Blastland’s ‘Tiger that Isnt’. The ‘learning outcomes’ I’ve come up with are below: all suggestions about additions, deletions or amendments gratefully received… Currently it is a very long list: probably too long, but what to leave out??

Understand what a variable is, what is meant by its distribution, and some of the ways in which the latter can be described and summarised.
Recognise and carry out simple manipulation of proportions, fractions, decimals, and percentages.
Understand rates and rates of change, and their expression by logarithms.
Use procedures of informal estimation to check the orders of magnitude of quantities used in reports (including academic or scientific output, policy documents or the mass media) and to avoid spurious accuracy.
Understand the meaning of randomness and of the independence of events or states.
Recognise a ‘Normal’ or Gaussian distribution.

Understand probability and risk as ways of measuring uncertainty.
Undertake simple calculations of cumulative and conditional probability
Understand the distinction between absolute and relative risks, and perform simple calculations of risk using natural frequencies and Bayes rule.
Understand the concept of correlation and its distinction from causation.
Know how to read, interpret, produce and present data in the form of contingency tables.

Understand the difference between experiment and observation
Understand the difference between observational and experimental control
Understand what is meant by a random sample and sampling fluctuation
Understand what a confidence interval is and how it is expressed
The distinction between significance and substance
The concept of regression to the mean and its implications

Understand how data can be visualised, including bar charts, histograms, box plots, scatterplots and Venn diagrams.
Be able to use Excel to record, store, manipulate and present numerical data.

The article below, written by the consumer affairs editor, Harry Wallop, is a typical trawl of the latest ONS Social Trends publication in order to present some statistics showing that the ‘British family is a museum piece’.

It is a useful illustration of how statistics which may be correct in themselves can be used to create a misleading impression. There are also some mistakes, such as the suggestion that ‘the number of married couples, hit the lowest level, in real terms,  since 1895’ a claim evidenced by the number of marriages. Here the author manages to confuse (presumably) the rate of marriage, the number of marriages and the number of married couples. All three are different things, and none of them tell us what the author presumably wants to get at, which is how likely is it for couples who live together to be married….


Teaching inference from sample statistics to population parameters is always a challenge as it is difficult to avoid talking in abstractions that students may struggle to grasp: the random nature of sample selection, the idea of repeated sampling, a sampling distribution, standard errors, and perhaps most difficult of all, the shift from the distribution of multiple sample estimates around a population parameter to the probability of a population parameter lieing within a given distance of a single sample statistic.

Chris Wild and colleagues at Auckland, have developed visual approaches to understanding inference working with schools. Their work has many potential applications in working with university students.

The webpages that introduce this work are at



It is available for download at the following URL:
The sites have a couple of demonstration videos and other support material. I’ve also uploaded a quick supplementary guide at Youtube and a quick pdf guide dowloadable below:
The software works fine on a PC. I’ve also got it to work on a Mac, but theanimation may need a bit of tweaking to work properly in the latter environment.
JM 2 5 12

Non random samples

It is a well worn teaching practice to hold up examples of results from large, non-random samples and contrast them with the greater accuracy obtained from much smaller random ones.

The standard but dated e.g. is the 1936 US Presidential Election Literary Digest poll, wrongly forecasting a landslide for Landon based on a sample of 2.4 million voters, and  Gallup Polls, based on a much smaller random sample of 50,000 that not only predicted the right election result (although it underestimated Roosevelt’s share of the vote at 56% instad of 62%) but also the result that the Digest’s methods would produce, before they had produced them! The event hastened the demise of the Literary Digest, and made George Gallup’s reputation.

Info on the Literary Digest poll is at



The best account is in Freedman, Pisani & Purves Statistics (various editions).

80 year old US elections don’t grab the imagination, however Agresti & Finlay Statistical Methods for the Social Sciences  (Pearson, 3rd ed 1997) p. 7 draws attention to Shere Hite’s claim in Women in Love that 70% of women married at least 5 years had had an extra marital affair, based on a sample of 4,500 women.

However 100,000 questionnaires were issued, so that it is highly unlikely that the 4.5% returned were a random sample.

Agresti & Finlay take the matter no further, but it would be good to find some data based on a random sample. Enter David Atkins and colleagues who use US General Social Survey data (collected through face to face interview) reported in an article in Journal of Family Psychology (2001, v.15, no. 4:735-749) ‘Understanding Infidelity: Correlates in a National Random Sample’ estimates the rate at nearer 5%!

The latter article is also a very readable report of a logistic regression, useful for teaching that procedure.