David Banks on Reproducible Research

Just got an email linking to Reproducible Research: A Range of Response, in the new journal Statistics, Politics, and Policy 2(1) by David Banks, who is also the journal's editor. Interestingly, the commentary doesn't mention the journal's policy (if one exists) on the reproducibility of research submitted there.

Banks' writing is easy to read, though I had to use the dictionary not once, but twice (opprobrium, fecklessness)! Banks offers the following suggestions. My comments are interleaved.

1. Poor reproducibility is tacitly promoted by a system that rewards pub-
lication of positive results. But in an era of electronic media, which
eliminates the competition for shelf-space, it should be perfectly possi-
ble to publish null results. It is good science and honest work to e-publish
a paper that says “None of the following 1,000 genes in my study had
any statistical relationship to cancer.” Our current research culture does
not value this contribution as it should, but if such papers appear in
respected e-journals, I think that may evolve.

Yes. The current research culture seems to promote confirmation bias with impunity, except in the rare cases when 'forensic statistics' detect a flaw, and the flaw is publicised. But, I wonder whether limited 'shelf space' is sometimes an excuse to maintain exclusivity? If this is really why negative results aren't published, then no amount of cyber space will be sufficient.

3. It would be worthwhile to create a continuous measure of reproducibility,
and score a random sample recent publications to determine the extent of
the problem. We may be suffering from attention bias (which some peo-
ple claim was Marc Hauser’s failing in his primatology interpretations).
Perhaps we are so dazzled by the occasional scandal that we overlook
the fact that most research is quite sufficiently solid (although surely
not perfect). On the other hand, the dazzling scandals may distract us
from the possibility that nearly all applied statistical research has pretty
serious flaws. Without some sort of baseline, it will be difficult to assess
progress.

Also a good suggestion. Here's a potential continuous measure of reproducibility: The time required to reproduce the results of a study, including the necessary 'forensic statistics', computing, and reproduction of figures and tables. I think the penalty for computing time is valid, because unusual computing requirements are a barrier to reproducibility. In my experience, the time required for 'forensic statistics' has significantly swamped the subsequent computing time. Of course, this measure is sensitive to the skill and knowledge of the forensic statistician, computing equipment, and is labor intensive. But, a maximum time to establish reproducibility could be imposed, say one or two hours. For reproducible research implemented with computer tools like Sweave, this metric might be reduced by orders of magnitude.