p-hacking primer

Via io9, John Bohannon explains how fake science works, specifically, the problem of “p-hacking”:

If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Think of the measurements as lottery tickets. Each one has a small chance of paying off in the form of a “significant” result that we can spin a story around and sell to the media. The more tickets you buy, the more likely you are to win. We didn’t know exactly what would pan out—the headline could have been that chocolate improves sleep or lowers blood pressure—but we knew our chances of getting at least one “statistically significant” result were pretty good.

Whenever you hear that phrase, it means that some result has a small p value. The letter p seems to have totemic power, but it’s just a way to gauge the signal-to-noise ratio in the data. The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation. The more lottery tickets, the better your chances of getting a false positive. So how many tickets do you need to buy?

P(winning) = 1 – (1 – p)ⁿ

With our 18 measurements, we had a 60% chance of getting some“significant” result with p < 0.05. (The measurements weren’t independent, so it could be even higher.) The game was stacked in our favor.

You can also check out this fun visualization of the p-hacking problem. (Hat tip: Cliff Pickover.)

xkcd

About F. E. Guerra-Pujol

When I’m not blogging, I am a business law professor at the University of Central Florida.

View all posts by F. E. Guerra-Pujol →

3 Responses to p-hacking primer

kassbelaire says:

January 16, 2017 at 16:22

“The conventional cutoff for being “significant” is 0.05, which means that there is just a 5 percent chance that your result is a random fluctuation.”

Should be: “which means that there is just a 5 percent chance that results at least that extreme would result from random fluctuation.”

Pingback: *The elephant in the room: p-hacking and accounting research* | prior probability
Pingback: The p-hacking of the ChatGTP wolves | prior probability