Data dredging

Let’s proceed with our parade of fraudulent data practices, shall we? Next up is data dredging (a/k/a “p-hacking”), a more sophisticated (and less transparent) form of cherry picking. In the words of Wikipedia: “The process of data dredging involves automatically testing huge numbers of hypotheses about a single data set by exhaustively searching … for combinations of variables that might show a correlation ….” This form of data fraud thus occurs when researchers perform multiple statistical tests on a single set of data and then selectively publish only those results that satisfy some test of statistical significance. Such ex post results, however, are often just spurious correlations. The lesson here is this: beware of so-called “statistically significant” results. To avoid perpetrating this form of data fraud (and reduce positive-results bias to boot), some journals and funding organizations are now requiring researchers to preregister their clinical trials, stating in advance what hypotheses they are going to be testing.

Image result for data dredging

About F. E. Guerra-Pujol

When I’m not blogging, I am a business law professor at the University of Central Florida.
This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to Data dredging

  1. Pingback: Publication bias | prior probability

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s