What the world’s urban poor really want …

… relief from traffic congestion! Here is an excerpt from Michael Hobbes’s excellent essay “What I learned from the crippling gridlock in Dhaka, Bangladesh”:

I am in a tiny steel cage attached to a motorcycle, stuttering through traffic in Dhaka, Bangladesh. In the last ten minutes, we have moved forward maybe three feet, inch by inch, the driver wrenching the wheel left and right, wriggling deeper into the wedge between a delivery truck and a rickshaw in front of us.

Up ahead, the traffic is jammed so close together that pedestrians are climbing over pickup trucks and through empty rickshaws to cross the street. Two rows to my left is an ambulance, blue light spinning uselessly. The driver is in the road, smoking a cigarette, standing on his tiptoes, looking ahead for where the traffic clears. Every once in awhile he reaches into the open door to honk his horn.

This is what the streets here look like from seven o’clock in the morning until ten o’clock at night. If you’re rich, you experience it from the back seat of a car, the percussion muffled behind glass. If you’re poor, you’re in a rickshaw, breathing in the exhaust.

Me, I’m sitting in the back of a CNG, a three-wheeled motorcycle shaped like a slice of pie and covered with scrap metal. I’m here working on a human rights project related (inevitably) to the garment factories, but whenever I ask people in Dhaka what their main priority is, what they think international organizations should really be working on, they tell me about the traffic.

Illustration by Sophia Foster-Dimino
Posted in Uncategorized | Leave a comment

Does sex improve athletic performance?

Credit for the diagram goes to David Yanofsky. Read more here: “All of the teams that banned sex at the World Cup have been eliminated … so have a lot of the teams that didn’t ban it. So let’s not draw any inferences.” What would a Bayesian say?

Posted in Uncategorized | Tagged , , | 1 Comment

This isn’t a math test …

Two cheers for Team USA … next time in Russia!

Posted in Uncategorized | Tagged , | Leave a comment

Evidence of match-fixing at the World Cup?

In German: here. In English: here. We blogged about this possibility on 23 June. So, we need to update our priors …

 World Cup match-fixers?

Posted in Uncategorized | Tagged , | 1 Comment

Faking it

We can’t help but cheer for Pepe (although his little headbutt looks fake too!) … Also, you will find a plausible theory explaining why players have an incentive to fake injuries in Michael Gard’s excellent essay Faking it: why football players feign injury. Here is an excerpt:

The first thing to say is that feigning injury in football today has reached truly epidemic proportions. A Wall Street Journal article [mischievously titled “World Cup Flopping Rankings”] reported 132 minutes of “writhing time” in just 32 World Cup games. Of the 302 separate instances of players appearing to be very seriously hurt, 293 of them were up and playing within seconds. Just nine were actually injured.

Don’t all these fakers deserve a post-game red card, starting with Thomas Muller?

Posted in Uncategorized | Tagged , , | 1 Comment

“Bayesian reasoning” postscript

Note: We recently concluded a five-part review of the main points in Howson & Urbach’s important paper “Bayesian reasoning in science” (see our various Bayesian blog posts from 25-28 June). We now wish to present our own thoughts in this postscript. Spoiler alert: we are going to apply the self-reference test to both frequentist and Bayesian methods!

Consider, first, by way of example the “independent samples t-test.” (Frequentists have a huge toolkit full of lots and lots of ad hoc statistical tools for evaluating the results of experiments, but we shall focus on the t-test since it’s one of the most common forms of statistical significance.)

Now, instead of tossing one coin 20 times (see our previous posts from 27 & 28 June), let’s say you toss two coins — coin A and coin B — 20 times each (for a grand total of 40 coin tosses). Further suppose that coin A produced 11 heads in 20 trials, while coin B only produced 8 heads … Are these results atypical or completely random (i.e. noise, not signal), or are they “statistically significant” — i.e. within the range of what you would expect to find anytime you toss a fair coin 20 times?

In other words, we want to know whether the difference in results between the two experiments (e.g. # of heads produced by both coins) is “statistically significant” or not, e.g. whether the difference in results reflects a “real” difference in the type of coin used to generate our experimental data. “t-tests” (collecting independent samples and comparing them) and “statistical significance” are thus standard statistical tools for evaluating the results of experiments. But are they “science”?

We leave the ‘science’ question open, for now. The main problem we have with “t-tests” and standard statistical methods generally are their inability to pass the self-reference test (see, for example, our post from 14 May). For example, returning to our coin-toss example above, let’s say that you have finished conducting your (first-order) t-test experiment (e.g. tossing your coins and counting up the total number of heads generated by each coin) and that you have also finished evaluating the statistical significance of your test results. Now, shouldn’t you also conduct a second-order or higher-level experiment to measure the statistical significance of your statistically significant results?

This is not a frivolous or trivial question. In words, the whole purpose of “statistical significance” is to tell us something important about our first-order data (e.g. the results of our coin-toss experiments), but our statistical analysis will, in turn, generate a new set of second-order data, such as sample size (e.g. the number of coin tosses), the size of the difference between the sample averages (e.g. the number of heads generated by each coin), and the standard deviations of the samples.

So, why can’t we test each one of these second-order data points for statistical significance? That is, why can’t we test the t-test itself?

* * *

The Bayesian approach to truth, by contrast, is not only completely open to self-criticism; it is also able to pass the self-reference with flying colors. Just follow the following two steps:

First, you need to assign some subjective prior probability to the truth of Bayes’s rule itself. Note: it doesn’t matter what your priors are in this regard, since you might be highly skeptical of inverse probabilities or you might be a hardcore Bayesian through-and-through, so long as your priors are not completely dogmatic, i.e. 0 or 1. (For example, if you are not a Bayesian or if you simply distrust Bayesian methods, then assign a low value to this prior (a value less than 0.5 but greater than 0). If you are a Bayesian, then assign your prior a high value (a value greater than 0.5 but less than 1); or if you are a good Bayesian, assign a value of 0.5.)

Next, put Bayes’s rule to the test by using Bayesian methods to make predictions or to measure the truth of certain propositions and then “update” or revise your priors accordingly. If the Bayesian approach gives you good results, then … keep on using Bayesian methods. But if Bayesian methods fail to make good predictions or fail to bring you closer to the truth, you have effectively falsified the Bayesian approach … In that case, it’s time to look somewhere else for answers.

But here’s the rub. That “something else” should in principle be open to self-criticism. It should be subjected to the self-reference test. It should be falsifiable. Bayesian approach has the virtue of meeting these conditions. Are frequentists able to?

True or false?

Posted in Uncategorized | Tagged , | Leave a comment

Is the offside rule in international football (Law 11) clear or confusing?

FYI: Here are the official guidelines for interpreting Law 11 (the offside rule in international football). Here is a useful Power Point presentation (consisting of 37 slides) explaining Law 11.

Posted in Uncategorized | Tagged , | Leave a comment

The problem of priors

Note: this is the fifth and final installment of our review of the paper “Bayesian reasoning in science” by Colin Howson and Peter Urbach.

We now come to the “main event”: the problem of priors.

That is, where are you supposed to get your Bayesian priors from? (To return to the coin-toss experiment we discussed in our previous post, for example, what prior should the experimenter assign to the probability of the coin being fair?) Aren’t all persons’ priors ultimately subjective and thus non-scientific, for as Howson and Urbach acknowledge on p. 374 of their paper: “at some point … prior probabilities will have to be used which merely reflect [subjective] opinion.”

In the last part of their paper, Howson and Urbach describe several valiant attempts to generate “explicit, ‘objective’ rules for calculating priors” yet end up conceding that “there seems to be no way of ‘objectively’ defining prior probabilities.” So, what is to be done, then? How do Howson and Urbach, in particular, deal with the problem of priors?

In brief, the authors argue that subjectivity is good, that subjectivity is universal, and that subjectivity is irrelevant.

First, they argue that the subjective nature of Bayesian priors is a strength, not a weakness: “… our argument has all along been that this is really no weakness: it allows expert opinion due weight …” [Time out #1: what if the “experts” themselves disagree with each other? Whose prior wins out?]

Next, they take a direct swipe at Fisher and his sundry disciples, arguing that all science is inherently subjective [time out #2: really?] and that Bayesian methods are at least open and honest about their subjectivity. In the pull-no-punches words of Howson and Urbach:

[A subjective prior] is a candid admission of the personal element which is there in all scientific work. The inventors of ‘objective’ methodologies [ i.e. Fisher and his ilk] … merely sweep the personal element under the carpet.

Lastly, they argue that the subjective nature of Bayesian priors is irrelevant, since persons with different prior beliefs should converge in their posterior beliefs as data and evidence accumulate (assuming those persons are good Bayesians, of course). [Time out #3: so why do we see so little convergence in so many different domains, such as politics, economics, and philosophy?]

Are you persuaded by any of these arguments? Are they consistent with each other? Let us know what you think …

Keep calm and update your priors.

Posted in Uncategorized | Tagged , | Leave a comment

Lies, damned lies, and … statistics

Note: this is the fourth part of our review of the paper “Bayesian reasoning in science” by Colin Howson and Peter Urbach. (The fifth and final installment of our review shall appear on 28 June.)

Let us return to Howson and Urbach’s Bayesian paper today. After presenting Bayes’s rule and the Bayesian approach to truth on pp. 371-372 of their paper (see our Bayesian blog posts of 25-26 June for reference), Howson and Urbach concede that the Bayesian approach to truth “has been widely criticized because it is based on personal, hence subjective, probabilities [cf. the problem of priors we talked about in our post of 26 June titled “Beliefs are like gambles”]. Scientific inference, critics say, should be perfectly objective.” Howson and Urbach thus spend the rest of their paper comparing and contrasting the Bayesian approach to truth with its leading challenger, what they refer to as the “classical statistical inference” model, an alternative approach to truth associated with the work of such giants as R. A. Fisher, Jerzy Neyman, and Egon Pearson (all of whom Howson & Urbach lump together as “classical statisticians”).

In brief, Howson and Urbach begin the second part of their paper by noting that the “classical” or non-Bayesian approach to truth “has two principal parts, the first relating to the testing of hypothesis (using significance tests) and the second to estimating the values of unknown parameters.” (In this post, we shall focus on Howson and Urbach’s critique of Fisherian hypothesis testing and the related idea of “significance”.) The authors then take a simple example to illustrate the Fisherian approach: an experimenter tossing a coin 20 times and counting the number of times the coin lands “heads” in order to test whether the coin is fair or not. “There are 21 possibilities,” they write, “ranging from no heads and 20 tails to 20 heads and no tails.” But how does the experimenter in this simple example know whether the coin is fair, i.e. how does he actually “test” his hypothesis in this case? If he is a Fisherian, he must perform a secondary “significance test”; that is, he must now proceed to “test” his results from the 20 previous coin tosses (though not the coin itself).

You will find the splendid details of Howson and Urbach’s critique of significance testing on pp. 372-373 of their paper, but their main point, as we understand it, is this: whether the experimenter’s coin-toss results in the example above are “significant” in a statistical sense at some predetermined level (such as 0.05) tells us nothing about the actual coin being tested! Why? Because a significance test is not a direct test of truth; it is simply a secondary or subsidiary test of one’s experimental data. (By way of analogy, consider the difference between a historical or legal investigation into the actual contents of a document versus an investigation of the way in which that document was made.) There is thus no necessary or logical relation between the “significance” of a given statistical test and the truth of the hypothesis being tested.

Worse yet, Howson and Urbach note that significance results are easy to manipulate and are super-sensitive to experimental design. In particular, they present this additional critique of significance testing on p. 373 of their paper — the stopping-rule problem:

In our earlier example, it was assumed … that because the coin was tossed 20 times, all of the possible outcomes would exhibit [some combination of] 20 heads and/or tails. But these are the possible outcomes only if the experimenter has a premeditated plan to throw the coin 20 times. Had the plan been to stop the experiment when, say six heads appeared, he could have got just the result he did, but with a different list of unrealized, possible outcomes.

So what? Here’s what:

Because significance is calculated by reference to these [unrealized, possible] outcomes, a result could be significant if the experimenter had had one plan (or stopping rule in mind), but not significant if it was another.

In short, in the eloquent words of Howson and Urbach: “This dependence of significance tests … on the subjective, possibly unconscious intentions of the experimenter is an astonishing thing to discover at the heart of supposedly objective methodologies. It is also a most inappropriate thing to find any methodology, for the plausibility, or cognitive value, of a hypothesis … should not depend on the experimenter’s mind.” (Ouch!)

But hold on in a minute … what about the problem of subjective priors (which we noted in our post “Beliefs are like gambles” below)? Does the subjective Bayesian approach to truth fare any better than standard Fisherian methods? Stay tuned …

Posted in Uncategorized | Tagged , , | Leave a comment

Did Kurt Gödel really discover a loophole in the Constitution?

Our 2013 paper Gödel’s loophole considers two related questions: why have so few scholars taken Gödel’s alleged discovery seriously, and what was this possible logical contradiction in the Constitution? (Hint: it probably has to do something with recursion.) There is also some recent discussion of our thesis at Hacker News (Y Combinator) here. (Note: We will return to our review of “Bayesian reasoning in science” in our next post.)

Posted in Uncategorized | Tagged , , | Leave a comment