Is research reliable?

Ray H. · October 19, 2013

Here's an interesting article on the not-so-perfect, current state of scientific research.

http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

This links to a brief video touching on the same issues.

http://www.economist.com/blogs/babbage/2013/10/science-wrong

ribuck · October 19, 2013

It's natural that new discoveries are at the edge of current knowledge, where existing understanding is fuzzy. So it's not surprising that over half of the newest findings are later discarded.

It's actually a fairly efficient way to do science. It wouldn't make economic sense for new research to be bulletproof. So what happens is that new research findings emerge with uncertainty. There may be an experimental error, or the results may just be due to chance. Then, if the findings are interesting or important, others will seek to replicate the findings (or, more commonly, to strengthen and extend them which has replication as a side-effect). Only when the findings are thoroughly replicated at a high degree of certainty (e.g. "five nines" or 99.999% certainty) do they become assimilated into science as a "fact".

It would be financially impossible to conduct every new experiment to a level of 99.999% certainty, since this costs hundreds of times as much as an experiment conducted to 95% certainty (a one in 20 chance of error).

You can see the process of refinement by looking at research into the risks of harm due to mobile phone radiation. These experiments have been conducted over many years. Occasionally one of them points towards a physical risk, but then more extensive experiments have highlighted a weakness or shown up a statistical error. Over time, researchers learn from this process and can design tighter experiments. In the case of mobile phone radiation, the science is still not "settled". In other words, although there is currently no rigorous evidence that phone use is harmful, we still don't have complete enough knowledge to totally rule it out.

This is good. This is a reasonable way for science to make progress. What is not so good is that the first tentative research results are often reported in the media as if they are settled fact, often to be retracted or debunked a year later. The problem here is in the quality of the initial media reporting.

The article to which you linked seems like a fairly good overview, except that it gets off to a terrible start with its discussion of priming. Priming can affect human behavior, but it doesn't affect the underlying science of replicability. Studies of priming are well-replicated, and priming is well-understood.

You could find, for example, that a reading in an experiment of drug effectiveness is about half-way between 103 and 104. A group of researchers who have been funded by a drug company might be more likely to read the result as 104 (i.e. more favorable), and another group of researchers might be more likely to read the result as 103 (because they are by nature cautious people). But this only matters in the initial exploratory experiments. By the time you get to five-nines certainty, the effects of priming have been well and truly averaged out if not eliminated.

Wuzzums · October 19, 2013

Dr. Ben Goldacre wrote a whole book on the subject http://en.wikipedia.org/wiki/Bad_Science_(book)

It also seems to me this type of bad research is prevalent in fields where the actual principles are very vague, such as medicine. The medical system does not rely on science in the same way the field of physics does. We don't know all that works, nor do we know why what works works, all we know is that it does hence the phrase "evidence based medicine". Add in the conflict of interest between the drug company and researchers and things get more foggy. Treatments today haven't much evolved since the last century, just tweaked a little here and there.

Furthermore all science points towards the understanding that there are no such things as diseases, just only ill people. Yet the research is focused on studying the disease, so a lot of aspects are basically lost in translation. The object of study is far too complex for the alternative to be viable seems to me. I don't know how we could go fixing the research, I'm personally leaning towards throwing it all away to start from a new perspective all together.

Ray H. · October 19, 2013

It's natural that new discoveries are at the edge of current knowledge, where existing understanding is fuzzy. So it's not surprising that over half of the newest findings are later discarded.

It's actually a fairly efficient way to do science. It wouldn't make economic sense for new research to be bulletproof. So what happens is that new research findings emerge with uncertainty. There may be an experimental error, or the results may just be due to chance. Then, if the findings are interesting or important, others will seek to replicate the findings (or, more commonly, to strengthen and extend them which has replication as a side-effect). Only when the findings are thoroughly replicated at a high degree of certainty (e.g. "five nines" or 99.999% certainty) do they become assimilated into science as a "fact".

It would be financially impossible to conduct every new experiment to a level of 99.999% certainty, since this costs hundreds of times as much as an experiment conducted to 95% certainty (a one in 20 chance of error).

You can see the process of refinement by looking at research into the risks of harm due to mobile phone radiation. These experiments have been conducted over many years. Occasionally one of them points towards a physical risk, but then more extensive experiments have highlighted a weakness or shown up a statistical error. Over time, researchers learn from this process and can design tighter experiments. In the case of mobile phone radiation, the science is still not "settled". In other words, although there is currently no rigorous evidence that phone use is harmful, we still don't have complete enough knowledge to totally rule it out.

This is good. This is a reasonable way for science to make progress. What is not so good is that the first tentative research results are often reported in the media as if they are settled fact, often to be retracted or debunked a year later. The problem here is in the quality of the initial media reporting.

The article to which you linked seems like a fairly good overview, except that it gets off to a terrible start with its discussion of priming. Priming can affect human behavior, but it doesn't affect the underlying science of replicability. Studies of priming are well-replicated, and priming is well-understood.

You could find, for example, that a reading in an experiment of drug effectiveness is about half-way between 103 and 104. A group of researchers who have been funded by a drug company might be more likely to read the result as 104 (i.e. more favorable), and another group of researchers might be more likely to read the result as 103 (because they are by nature cautious people). But this only matters in the initial exploratory experiments. By the time you get to five-nines certainty, the effects of priming have been well and truly averaged out if not eliminated.

You don't seem to have read the article beyond the first few paragraphs, since your points are all addressed. Also, the author isn't claiming that priming hinders replicability. It's simply an example of research that has become controversial. Which it has: http://www.nature.com/news/replication-studies-bad-copy-1.10634 .

ribuck · October 20, 2013

You don't seem to have read the article beyond the first few paragraphs

I hate it when I put a lot of time into reading someone's link, and they say that

the author isn't claiming that priming hinders replicability

OK, you are correct to call me out on that. My point is that there are certain priming experiments that are well-replicated, and others that cannot be replicated. Those that are well-replicated have been incorporated into a respectable scientific understanding of priming. Those that cannot be replicated are not worthy of an article in The Economist.

It's perfectly normal and proper that cutting-edge scientific experiments may or may not be replicable. These first experiments are the ones that potentially open up new areas of knowledge. Some of them will prove to be false leads, and some will prove to be goldmines.

If a researcher cannot replicate an earlier experiment, they publish their results. Subsequent researchers will see this when they do bibliographic searches on the original research, and won't build their new experiments on top of the non-replicated research. Or, if they choose to do so, they won't be surprised if their own work turns out to be non-replicable.

The article uses a 5% threshold for statistical significance in their examples of false positives. The weakness of this threshold is of course understood and accepted, yet it is still an acceptable threshold for initial research that needs to be replicated. Only when, after replication, is the threshold near enough to zero, does the new research become accepted as "fact". For example, a 0.001% chance that the results are due to statistical variance, which corresponds to a 99.999% certainty of correctness.

I say again that the weakness isn't with the majority of new research being unreplicable. The weakness is that publications like The Economist get over-excited about the significance of new research.

As for the weakness of the 5% threshold, it's no scandal. ALL scientists already understand that. Even many lay people grasp it:

Posted Image

Ray H. · October 20, 2013

It's perfectly normal and proper that cutting-edge scientific experiments may or may not be replicable. These first experiments are the ones that potentially open up new areas of knowledge. Some of them will prove to be false leads, and some will prove to be goldmines.

If a researcher cannot replicate an earlier experiment, they publish their results. Subsequent researchers will see this when they do bibliographic searches on the original research, and won't build their new experiments on top of the non-replicated research. Or, if they choose to do so, they won't be surprised if their own work turns out to be non-replicable.

The article uses a 5% threshold for statistical significance in their examples of false positives. The weakness of this threshold is of course understood and accepted, yet it is still an acceptable threshold for initial research that needs to be replicated. Only when, after replication, is the threshold near enough to zero, does the new research become accepted as "fact". For example, a 0.001% chance that the results are due to statistical variance, which corresponds to a 99.999% certainty of correctness.

I say again that the weakness isn't with the majority of new research being unreplicable. The weakness is that publications like The Economist get over-excited about the significance of new research.

The article doesn't criticize statistical significance. Instead, it accepts 95% as the standard, then spells out how positive results are actually much less than 95% accurate due to other factors. It spells out the problem of negative results being under-reported. It points out the lack of enthusiasm for replicating research for career reasons and for funding reasons. It spells out the problems of peer review. It spells out the issue of tacit knowledge and the less-than-forthcoming methodology used to obtain results which prohibits them from being replicated.

The bottomline:

1. Published studies focus on positive results which are far less accurate than negative results.

2. Studies are not being accurately judged in the peer review process.

3. Studies are not being replicated sufficiently, because A) there is little incentive to do so and B) methodology is not divulged.

4. Therefore, the research available is largely secretive, poorly analyzed, and unreplicated rendering it less than trustworthy.

ribuck · October 21, 2013

Ray, unfortunately we seem to be talking at cross-purposes. I'm not trying to debunk the article; I'm just saying that the article doesn't say anything really profound, beyond "some people get over-excited about low-quality research".

The bottomline:

1. Published studies focus on positive results which are far less accurate than negative results.

As a generalization, that's true and it will probably always be the case. Most people find positive results more interesting than negative results, so it's probably inevitable that researchers and journalists will focus on positive results. The scientific method has the statistical tools to accommodate this, so we can live with it.

2. Studies are not being accurately judged in the peer review process.

Peer review has always had this problem, and probably always will. Peer review is still useful though; it's just that it's a very weak process. If peer review allows through 80% of valid research and 79% of invalid research, it's still has some filtering power.

3. Studies are not being replicated sufficiently, because A) there is little incentive to do so and B) methodology is not divulged.

You can't declare what is "sufficient" replication, because it's a cost-benefit decision. If a research finding is not very important, there's little incentive for others to try to replicate it. It it's important (such as the recent flawed experiment that seemed to show neutrinos exceeding the speed of light) then others will attempt to replicate it.

It's not a dealbreaker if the methodology is not divulged. If others can replicate the results using different experiments, those are good replications and those researchers can divulge their methodology. Knowing the original methodology can sometimes provided a short-cut to identifying a flaw in the original experiment, but it's not essential to the scientific method.

4. Therefore, the research available is largely secretive, poorly analyzed, and unreplicated rendering it less than trustworthy.

That would be true if you had said "much of the research" instead of "the research" (which implies "all research"). Here's how I would phrase my "bottom line":

Much of the available research is insufficiently replicated and therefore unreliable. Such research may be interesting because it opens up promising directions for study, but it cannot be considered to be "new knowledge" until it has been consistently replicated.

Ray H. · October 21, 2013

That would be true if you had said "much of the research" instead of "the research" (which implies "all research"). Here's how I would phrase my "bottom line":

I'll make this one point and then kindly leave you to your lack of concern over the issues raised by this article. In the sentence you are quoting by me, I said "the research is largely secretive" the word largely limits the statement to less than all. I clearly did not mean all research.

Sign In

Is research reliable?

Recommended Posts

Ray H.

ribuck

Wuzzums

Ray H.

ribuck

Ray H.

ribuck

Ray H.

Important Information