The Reproducibility Glitch

By Miguel Ramírez

We are in a reproducibility crisis.

Which means that Science as we know it is at crisis.

Tip: this article is full of references, click them to know more!

Cartoon by Paul Blow (Nature)
Cartoon by Paul Blow (Nature)

Remember when those guys announced a revolutionary method for getting stem cells that ended up being a fraud? Do you know about a published article from your field that despite being in a renowned publication is dismisses by everybody as “false” behind the curtains? Are you really sure that you followed the protocol step by step, and that an uncontrolled variable is not responsible of your outstanding discovered phenotype? Can you tell me right now the statistic power of the latest analysis you run on R or SPSS?

These four situations deal with different cases and reasons, but all underline the very same problem: we are in a reproducibility crisis. Which means that Science as we know it is at crisis. Reproducibility is part of the scientific method. It guarantees that under the same conditions (hello Material and Methods section!) other researchers can replicate our work. It means our results are real, and that the money we used was nicely invested. Our ability as scientists to reproduce results by others, or better said, our inability, is a growing concern that not everybody is paying attention. Some voices suggest we are observing a receding of the coastline, a warning sign of an upcoming tidal wave.

I have started the post with a hypothetical case where your fellow scientists distrust an article. Usually questions about reproducibility are plain gossip, nosy enquires that are brought to surface once the congress is over and wine and cheese are running. But sometimes the problem transcends the specialized circles and reaches general population. Last summer, the topic made into the headlines of news and papers around the globe, as a retrospective work lead by Brian Nosek found something really scary: barely a bit more than one third of high impact psychology scientific articles (from a selected cohort of 100) were successfully reproduced. You can guess the huge buzz this project generated even before the formal publication. Psychologists argue (today, 4th of March to be precise) that the problem is not that big, neither the reliability of the study. Psychology has been always under heavy surveillance in terms of reproducibility for a number of reasons, but it is not that cancer research is more trustworthy. It is just more expensive to be re-tested . A reproducibility study at economics recently found this topic to be more secure. But ey, with only 6 out of 10 papers being replicable.

A large number of reasons and factors shape this intricate issue and it is just not possible to give a fair explanation of all with this small and probably ignorant contribution. Despite the buzzing a fraud scandal unleashes, corruption and voluntary manipulation of results probably accounts for the minority of cases. Then, how is that possible that honest and nice scientists are publishing incompletely verified facts?

If I have to point just one thing, I bet the Publish or Perish mentality. Nowadays, an average scientist designs an experiment, collect data, pieces a story altogether, and sends it to publication. And the problem is that we do not publish our work, we work for publishing. We have to do it because we need to feed our CV in order to get grants, which will give us the chance of working some more years for publishing. And so on. What a gargantuan Catch-22 . This system makes our research biased. We are not pushing the edges of knowledge, we want to show a causality relation between factor A and effect B. A number of consequences emerge because of this thinking, like underrating of the negative results. Without doubting of the honesty of the majority of us, the “no conflict of interest” statement found at the end of the paper you are reading is probably a lie. There is always a conflict, unless for those wealthy enough for doing science as a hobby (please remember that most of the great scientists of history were actually not grant-digging fellows). This is just a bit of what Peter Lawrence (a renowned developmental biologist, if that is not your topic) defines as “an insidious corruption of the practice of research”, in his own words:

Have these political changes impacted on research itself? I think the answer is yes. It is instructive to look at how the change in the primary purposes of publication (from communication and record to producing tokens that will yield salaries and grants) has affected the way we structure our research. All of us have had to focus our research to produce enough papers to compete and survive. Thus, projects are published as soon as possible and many therefore resemble lab reports rather than fully rounded and completed stories. There are many reasons why projects may not be pursued to a point of clarification, a clarification that would benefit everyone, particularly the reader. Often the person responsible only has support for 2 or 3 years and has to leave; there- fore, passing on the project to another person can cause authorship disputes. Also clearing up inconsistencies in research can take too much time; it may be more productive to publish what we have and move on. Consequently, it may be more effective in terms of the numbers of papers to start a new project for each person (for each potential first author). Thus, I think this emphasis on article numbers has helped make papers poorer in quality. And, even more significant, there is an effect on the choice of projects.

(More about the topic here)

But out of politics and bureaucracy, the experimental bias is still there. John Ioadinnis caused a huge impact with an article whose title casually claimed that most of the published research data are false (read the article here). And he came to that conclusion just by looking at the statistics. We are obsessed with p-value and writing as many asterisks as we can.

Consider a given experiment you just presented to your supervisor in an amazing and fruitful meeting. You are more than happy because you have reached a significance of 1% (therefore your experimental bar has two asterisks over it). But, do you know what are you actually claiming?

You have designed an experiment by defying a null hypothesis (there are no differences if my treatment is applied) with an alternative one (there are differences). Well, your beloved p-value is telling you the odds of finding such a significant difference in case the null hypothesis is true. Therefore, in our example, by trying the non-effective treatment 100 times, 1 time you will find a difference as big as the reported one. But what is the actual probability of the null hypothesis being false? You are not reporting that. P-value on its own is sadly an overrated tool, as it does not give all the information. This is not intended to be an statistics course, but try to get use to terminology, you need to know the differences between p-value, significant level, or type I or II errors while reporting your data!

In terms of not only our PhDs, but actual contributions to society welfare, the reproducibility crisis is specially painful. Many reasons can cause a clinical trial to fail despite extraordinary results in early phases and models, but not replicable data (if tested) could be a hidden but enormous contributor. This article  points that the majority of replication attempts of 67 articles found inconsistencies

nrd3439-c1-f1
Taken from Prinz et al., Nat Rev Drug Discovery 2011

With so many fields, methodology and projects, is quite difficult to calculate the actual economic cost of the reproducibility crisis. Freedman et al., 2015, estimated that, yearly, about 28 USD billions are thrown to the bin in the shape of irreproducible preclinical studies. That is the half of the total invested money. That is almost what NIH received from federal funds for the fiscal year 2016. And almost twice the NASA budget.

Picture by Harry Campbell, ScienceNews
Picture by Harry Campbell, ScienceNews

Ok, there is a scary, huge and ugly elephant at the room (kudos to the real and lovely elephants). Can we do something about it? Well, we are plain people, plain postgrad researchers probably, very low in the food chain. But the spiral traps us all. Even the higher authorities, those if not partially responsible at least unaware of the reproducibility problem, started like us.

Get used to talking about it. I have used the example of the elephant because these things act like the alcoholic cousin in a family meeting. Not many people want to (respectfully) put your research into question. Not many people dare to publicly announce they do not trust this published article. There are, however, some instances when scientific community stands up and shoots the elephant (metaphorically, I like elephants!) but we are talking about quite evident and strident frauds (like the ones involving human cloning or stem cells, I will talk about this soon).

Never underestimate the feedback you may get. Try to remove as many biases as you can with your experiments (are your samples blinded before acquiring data?) Learn to love negative results. Don’t be afraid of failing. All these pieces of advice tell us to be humble and honest with ourselves: There are millions of us outside and the universe is full of yet-to-be-discovered wonders, but those reachable with our current technology and understanding are finite.

What about the big picture? Nature is dedicating many articles to the topic in recent times. One of the last additions details some problems that we need to face before fixing the matter. Despite science is self-correcting (or should be, by its core principles), human factors are there, and journals do not like to shout out loud about retractions. Journals are run by people with limited amount of time, and may charge to the authors if they want to rectify a published work.

But maybe there is hope. We are talking about this now. More and more people are. Admitting that there is a huge and smelly proboscidean at the living room is the first step in order to solve the problem. But I love elephants, really.

Meanwhile, The Reproducibility Project: Cancer Biology aims to test the reproducibility of 37 (originally 50) carefully selected high impact articles from 2011-2012. Complete details about the project, including ongoing results, can be found here.

PS: Very recently, a paper from my topic (circadian rhythms) published in 2007 has been retracted. This has been due to reproducibility issues by a third party and the authors themselves. Take a look at the retraction text. I can not but applaud the honesty of admitting that the former work is currently not trustworthy.

PS2: Considering this a huge topic with many branches, I am working on a second part of this article, this time about the evil side of things. Or what happens when scientists actually sell a lie.

**Thanks to Dr Enrique Turiégano Marcos (UAM, Madrid) for the inspiration, ideas and insights into the reproducibility issue, and Emily Farthing (UoS) for proofreading!**

 

This blog solely represents the views of the author and does not reflect the views or opinions of BSPS. Blog content and comments will be moderated and any offensive comments removed.