Last month, a group of researchers representing labs from five institutions and the Center for Open Science published findings on the replication of 21 studies previously published in Nature and Science. Prior research had shown that only about half of social science studies were able to be replicated, and the authors sought additional evidence for reproducibility among highly prestigious journals that could have higher (or lower) replication rates.
The authors found that 13 of the 21 studies (62%) produced a significant effect in the same direction as the original study. Since there is no standard indicator of replication, the authors used many indicators, including Bayesian approaches, and found that all these approaches resulted in 12 to 14 of the studies considered replicated. Even when replicated, the effect sizes of the replication studies were on average only about half of the effect sizes reported in the original studies, suggesting that not only false positives (reporting a significant effect that cannot be replicated) but also inflated effect sizes of true positives in the original studies may contribute to the inability to replicate.
One surprising finding is that researchers are able to predict which studies are more likely to replicate. With limited resources to replicate every scientific study, it is encouraging that journal reviewers and editors may be able to detect studies that are less likely to replicate. This project did not ascertain which aspects of the study the researchers used to make their predictions regarding replicability, and it would be useful to obtain this information to provide reviewers with more precise guidance on what to consider. For instance, finding a large effect from a relatively small experimental manipulation (e.g., viewing the picture of a sculpture, carrying heavy vs. light clipboard materials) may be surprising and interesting, but such a finding may also be an indication that the study is less likely to replicate.
Replication is crucial to the scientific enterprise, but failure to replicate is not a sign that science is broken, but a healthy indication that science, by its very nature, doesn’t trust itself. Part of what drew me to the behavioral and social sciences was the perspective of not trusting intuition and common sense about why people do what they do, but instead testing these assumptions under the rigorous light of scientific inquiry. We want to conduct studies that can be replicated. Failing to replicate over a third of studies, as found in this project, is not acceptable. However, failure to replicate offers important leads into uncontrolled differences between studies that may have resulted in failing to replicate and serves as self-correcting feedback for our sciences.
The social and behavioral sciences are among the leaders in understanding the factors that influence replication and instituting reforms to improve replication. However, being a leader in these efforts can result in headlines such as “Researchers replicate just 13 of 21 social science experiments published in top journals” that imply that this problem is unique to the social and behavioral sciences. The NIH efforts in rigor and reproducibility were spurred not by replication failures in the social and behavioral sciences but by replication failures in basic biomedical studies.
In less than a decade, the social and behavioral sciences have reformed publication policies to foster replication. Many journals in the social and behavioral sciences have transparency policies and promote open science data sharing access. Many of these journals have adopted preregistration policies or incentives which are critical to addressing publication bias. The replication problem is not unique to the social and behavioral sciences, but our sciences are leading the efforts to improve scientific replication. OBSSR will continue to support these and other efforts that facilitate the conduct and reporting of replicable research.