We finally got around to reading Charles Wheelan’s 250+ page defense of frequentist methods in his 2013 book “Naked Statistics.” (Curiously, his book was published a year **after** Nate Silver best-selling tome “The Signal and the Noise,” a book that criticizes many of the statistical methods described in “Naked Statistics.”) Although “Naked Statistics” is “sparkling and intensely readable” (to quote from one of the many positive reviews of his book), our overall verdict is go read Nate Silver’s book instead. Consider this clumsy hypothetical on pp. 127-128 of Wheelan’s book, the case of the missing marathon runners (edited by us for clarity):

“Suppose you live in a city that is hosting a marathon [and an International Sausage Festival] * * * [Marathon runners and sausage munchers] are randonmly assigned to buses * * * Unfortunately, one of the buses [full of marathon runners] gets lost * * * As luck would have it, you stumble upon a broken-down bus near your home * * * This must be the missing bus! * * * Except you have one lingering doubt … the passengers on this bus are, well, very large. Based on a quick glance, you reckon that the average weight for this group of passengers has got to be over 220 pounds. There is no way a random group of marathon runners could all be this heavy.”

So, is the broken-down bus full of marathon runners or sausage munchers? Put another way, is the broken-down bus *the* missing bus?

The problem with this particular “shaggy-dog story” is that it’s not just one isolated or ill-chosen illustration in an otherwise well-written and thoughtful book. It’s Wheelan’s showcase — the silly and unrealistic example he returns to time and time again throughout his book — in defense of traditional frequentist methods — i.e. statistical significance, confidence intervals, p values, etc., etc. — methods that have already been thoroughly discredited by many others.

Worse yet, the missing-bus example is symptomatic of two larger problems with Wheelan’s book. One is that he equates science with long-since discredited frequentist methods. The other is that he keeps talking about probability, and yet, there is not a single reference to Bayesian methods in his entire tome. What’s up with that?

What happened to Rev. Bayes?

Let’s return to Wheelan’s missing-bus hypothetical, shall we? In the real world, if a bus full of marathon runners were to really go missing, and if you were to find a broken-down bus, you would not need to engage in a time-consuming and tedious analysis of sample sizes and standard deviations — unless you had unlimited time and resources … or an NSF grant! Instead, you would immediately look for relevant clues and update your prior beliefs accordingly. The weight and physical condition of the passengers on the missing bus are two such clues, but so are the location of the bus and the attire of the passengers — are they all wearing sneakers or carrying gym bags, for example?

Fundamentally, Wheelan confuses probabilities. The relevant probability is not whether the average weight of all the passengers on the broken-down bus are within one or two standard deviations of some mythical mean. That is a clumsy, indirect, and easily manipulable method of performing the true task at hand, for the relevant probability we are trying to measure is whether the broken-down bus is the missing bus! In brief, where is the missing bus? Bayesian reasoning offers a more direct and reliable way of guessing this probability.

Although Wheelan should know better — after all, he devotes considerable time and space describing all the potential problems and pitfalls with frequentist methods, especially in Chapters 10 and 12 of his book — in the end, his ill-fated faith in frequentism and linear regressions appears to be unshaken. Maybe next time, Professor Wheelan will consider writing another book — “The Naked Reverend” … Rev. Bayes, that is!

I haven’t read “Naked Statistics” and know nothing about it, but anyone who could say that Nate Silver discredits frequentist methods cannot know a thing about them. Not one thing! The tiny portion of Silver’s book that mutters with breathtaking incoherence about some imaginary Fisher who allegedly denies the existence of bias (when he’s actually the one who invented methods of experimental design to avoid and detect bias) is really and truly the worst thing I’ve ever seen anywhere on frequentist statistics. When he spoke to the ASA, he said journalists should be Bayesians in order to reveal their biases. Revealing your biases is swell, but why would you advocate multiplying those biases by frequentist likelihoods? Silver is obviously a smart guy who received some utterly off-the wall info on frequentist methods. That’s no excuse for him, and I think he is morally obligated to correct some of the silliness he wrote. Or does being a data ‘superstar’ mean never having to bother to get your facts straight regarding statistics? And by they way, all Silver’s examples that makes sense are frequentist. (If you search my blog for Nate Silver, you’ll find a few things.) errorstatistics.com

Hi Deborah, thanks for your comment as it makes me see “Signal & Noise” in a whole new light. Your points are well taken, though I still don’t see what good are frequentist methods for predicting one-off or single-probability events, like a sporting match or a law case. In the meantime, I will most definitely do a “Silver” search on your Error Statistics blog to get up to speed here.

You seem to think frequentist methods refer just to using relative frequency info, as in polling, in order to predict winners or average response–as in Silver’s work. That is one use of frequency info, one of the reasons Wasserman’s review of Silver points out that Silver IS a frequentist. So tell me how you recommend predicting such events WITHOUT frequency information- including frequency information on the reliability or performance characteristics of methods. (Whether your prima facie unreplicable one-time instances even count as genuine scientific effects –most would say no–is something I put to one side.)

But all of that was for prediction of particular events–scarcely the main task of statistical inference (testing, estimation, modeling). We’re not inferring events but underlying parameters, models, hypothesized mechanisms, etc.) For this setting, it’s crucial to control and assess how well probed the claims, estimates, models are. Earlier you’d said you agree with error statistics on this. If so, then you agree it’s essential to know how well probed various errors were, before inferring evidence of their absence. This knowledge about probative capacity (in statistical settings) demands frequentist information about the error probability characteristics of tests and other methods. The statistical -causal hypothesis (e.g., hormone replace therapy increases breast cancer in such and such pops) is not an event. Its a hypothesis or claim that can be relevant for explanation or prediction. Here again, the reliance on data generation with good error statistical properties is crucial: blocking, randomization, recognition of biases, etc.