It is a well worn teaching practice to hold up examples of results from large, non-random samples and contrast them with the greater accuracy obtained from much smaller random ones.
The standard but dated e.g. is the 1936 US Presidential Election Literary Digest poll, wrongly forecasting a landslide for Landon based on a sample of 2.4 million voters, and Gallup Polls, based on a much smaller random sample of 50,000 that not only predicted the right election result (although it underestimated Roosevelt’s share of the vote at 56% instad of 62%) but also the result that the Digest’s methods would produce, before they had produced them! The event hastened the demise of the Literary Digest, and made George Gallup’s reputation.
Info on the Literary Digest poll is at
The best account is in Freedman, Pisani & Purves Statistics (various editions).
80 year old US elections don’t grab the imagination, however Agresti & Finlay Statistical Methods for the Social Sciences (Pearson, 3rd ed 1997) p. 7 draws attention to Shere Hite’s claim in Women in Love that 70% of women married at least 5 years had had an extra marital affair, based on a sample of 4,500 women.
However 100,000 questionnaires were issued, so that it is highly unlikely that the 4.5% returned were a random sample.
Agresti & Finlay take the matter no further, but it would be good to find some data based on a random sample. Enter David Atkins and colleagues who use US General Social Survey data (collected through face to face interview) reported in an article in Journal of Family Psychology (2001, v.15, no. 4:735-749) ‘Understanding Infidelity: Correlates in a National Random Sample’ estimates the rate at nearer 5%!
The latter article is also a very readable report of a logistic regression, useful for teaching that procedure.