Psychological science is now facing an unprecedented crisis of reproducibility. The field is becoming aware of the systematic problems embedded in its research practices that have been widely employed by most academic journals. An emphasis on aesthetic rather than scientific standards has led to a publication bias for positive results, which, in turn, has encouraged questionable research practices (QRPs), such as p-hacking and HARKing. These processes have potentially created “null fields” where many findings are mere products of false positives. This risk is especially large in fields where the prior probability of the hypotheses being true is low. In fact, a recent large-scale replication project reported that the reproducibility of psychological literature is less than 40%. The psychology community is starting to respond to this crisis by becoming aware of the importance of pre-registered replication, and by reforming the publication standards of many journals. In this paper, we provide an overview of the facts and solutions to the present problems.
Several studies on scientific replication and meta-analytic approaches have illustrated the issue of low reproducibility or low effect size in psychology and related fields. Herein, the author described not only the problems generally underlying fields (e.g., questionable research practices and misconduct) but also problems specific to cognitive psychology. Reproducibility or effect size of experimental studies has gathered little attention from researchers in cognitive psychology. In addition, the lack of cognitive studies on researchers in cognitive psychology is related to the disregard for motivational factors on the reproduction/replication problems. Based on the understanding of these issues, the author discusses how future cognitive psychology can overcome them.
The replicability and robustness of psychological research has been questioned in recent years. However, few researchers have addressed the issue in developmental science. In this article, I discuss the replicability and robustness of developmental science. Two examples are reviewed. First, I introduce experiments on infants and young children. Several factors that may affect the results of the experiments, therefore revised provided difficulty in replication. The ManyBabies project was launched to resolve the issue. Second, I introduce longitudinal research the that examines is long-term developmental trajectory. Longitudinal research may cost time and money, and therefore robustness rather than replicability may be assessed. Finally, I argue the situation in domestic developmental science, and propose that more efforts should be done to improve reliability in developmental research.
Systems neuroscience is a field of science that bridges between the level of neural circuit level physiological phenomenon and the behavioral level of psychological phenomenon. In this field, we measure neural activities, such as electrical signals from single neurons or BOLD (blood-oxygen-level dependent) signals from the brain region, and correlate these findings with external stimuli or behavior from animals and humans. In this review, I point out some concrete problems on the reproducibility and transparency in the systems neuroscience field, and discuss how we should overcome it.
The reproducibility and reliability of research are fundamental tenets in science, and in animal psychology. In the field of animal psychology, researchers have used a number of different species in various tasks and settings, such that considerations of the reproducibility are necessary compared with human research. Furthermore, using the appropriate statistical analysis and improving experimental design, a concrete theoretical background underlying each research question seems only to be important for improving the reproducibility between experiments in which the same species were used, but also in the situation where different species have been used. Because it is sometimes difficult to standardize the tasks and settings among investigations in animal psychology, theoretical consideration should help improve the reproducibility of research, as well as the validity of the interpretation of results obtained. Such efforts would also contribute to reduce the unnecessary use of animals from the perspective of animal welfare.
Although null hypothesis significance testing has been strongly criticized for decades, it has been the dominant statistical method in the field of psychology. Non-reproducibility of findings in psychology can be attributed, at least partially, to an arbitrary threshold (i.e., .05) in the null hypothesis significance testing and overrepresentation of p-values. The present study surveyed papers from the Japanese Journal of Social Psychology and examined whether or not such overrepresentation also existed among psychology researchers in Japan. Effect size measures and p-values did not correspond well when p-values were set at around .05. Moreover, the frequency of p-values just below .05 was greater than expected. These results imply that the overrepresentation of p-values can produce unreliable and irreproducible results. Two types of remedies are discussed to alleviate the problems of overrepresentation of the p-values.
Reporting of reliability coefficients is an important procedure in articles describing the development of new psychological scales. However, it appears that Japanese psychology researchers have not yet arrived at a consensus regarding what constitutes a desirable magnitude for a reliability coefficient. In this study, I conducted a meta-analysis summarizing 65 test-retest correlations from 58 studies published in the Japanese Journal of Psychology, which is a highly-ranked peer-reviewed psychological journal in Japan. The results of a meta-analysis, which involved the use of a random effect model, showed that the desirable mean test-retest correlation was ρ =.76 (95% CI =.70–.81). There was no significant relationship between the test-retest correlations and coefficients alpha. The number of items of the scale correlated positively with the test-retest correlation coefficients. Researchers tended to mention problems regarding test-retest coefficients only when they were less than r =.50. The desirable usage of reliability coefficients was also discussed.
There is currently an ongoing debate about reproducibility in social psychology. One reason for low reproducibility is the excessive use of questionable research practices, called “p-hacking”. We present two direct replication studies of social priming and embodied cognition that failed to replicate the original findings under the circumstances of high statistical power. However, a variety of p-hacking attempts made it possible to obtain some false-positive findings based on the data from these two studies. We note that selectively reporting the results and deriving the hypothesis after the results are obtained may disguise the presence of p-hacking, and argue that pre-registration of studies and fair publishing of negative results could inhibit p-hacking.
Herein, I discuss some methodological aspects on the reproducibility of psychological data. Reduced reproducibility of psychological data can occur as a consequence of research misconduct, inadequate statistical methods, effects of uncontrolled latent variables, and low occurrence probability of the studied phenomena itself. As such, the validity of data and their analyses may be subject to these sources of irreproducibility, and the expected reproducibility level depends on theoretical and methodological characteristics of the studied phenomena and related variables. Until recently, psychologists’ historical reluctance for replication studies, derived from demonstrational and anecdotal usage of psychological data, has prevented psychologists from considering the issues related to reproducibility. Some possible alternatives for psychologists addressing reproducibility issues are also discussed.
The reproducibility of data has most frequently been discussed with respect to experimental research in social and cognitive psychology. This article addresses how the reliability and validity of an observational study can be confirmed and whether an observational study can be replicated in a natural setting, as well as in an experimental (or semi-experimental) setting in developmental psychology. In a natural setting, the subject’s cultural and historical background, as well as immediate factors, including the physical environment and social context, may need to be considered to understand individual behaviors and social interactions among two or more people. Therefore, it can be quite difficult to perform a meaningful replication study. Researchers in the various domains of psychology should employ various measures, including the use of alternative procedures other than a replication study, in order to ensure the reliability and validity of their findings.
Non-human primates live in a variety of habitats and exhibit diverse social systems. They vary in demographics (group sizes and age-sex class composition); as well as social cohesiveness. Therefore, inter and intra-specific variations in the behaviours of wild primates are commonly observed, resulting in difficulties with generalizing species- or group-specific behaviours. Despite this, studies investigating general patterns of primate behaviours and social systems often receive much attention in high impact journals, with a disproportionate decrease in priority for descriptive and/or case studies, such as observations of predation events on primates, or anecdotal descriptions of unique behaviours. This seems to ignore the fact that most comprehensive models for primates, such as socio-ecological models, were formulated based on long-term accumulation of simple descriptive studies and/or case studies. The general academic values for scientific publication need to be re-examined, with a suggested priority shift back towards the publication of basic scientific information, in order to contribute to further development in the field of primatological science.
From my academic experience of research design in a clinical study conducted in the interdisciplinary area of health psychology, social psychology and clinical medicine, I discuss the reproducibility and the “ideals” of research design in Japanese psychology research. To realize the “ideals” of research design in psychological research to reveal the universal nature of psychological phenomenon as complex and probable, it will be necessary to establish a robust theory through considerable theoretical studies, critical discussions, and experiments and surveys based on small hypotheses, and to conduct an empirical study using experimental design after registration in a psychological research registry, in which researchers should disclose their research design including estimated effect-size and calculated sample-size in advance.
The recent controversy over statistical data analyses sheds a light on a number of cases of abuse of statistical procedures. In this essay some practical aspects of statistical analyses, mainly in agricultural research, are discussed. During the past century eminent researchers, including K. Pearson, R. A. Fisher, J. Neyman, and E. S. Pearson, have established the theoretical basis of modern mathematical statistics, e.g., experimental design, sampling distributions, and hypothesis testing. Some users in psychology, agronomy, etc. might be liable to commit misconduct in statistical analysis. Of course while they are responsible for what they have done, they must understand not only the proper use of statistical methodology but also the characteristic of each science.
Given the reports about the Open Science Collaboration (Open Science Collaboration: OSC, 2015) by mass media, the news that 60% of published psychological findings in some limited areas could not be replicated may unduly reduce public trust in the whole field of psychology. However, from the perspective of science communication, we can promote public understanding of psychology by means of this reproducibility problem. Moreover, we may be able to discuss the scientific method as it applies to other research areas. This will contribute significantly to the public understanding of science. We should not only refer to the dark side of reproducibility problem but also provide future prospects when we pursue our work on our research and science communication.
Reproducibility issues in the field of psychology have taken on a new dimension with advancements in Science & Technology. In the process, important topics that had previously been discovered but not shared in the field of psychological science, i.e., p-hacking, are now becoming recognized. Moreover, hopeful methods to deal with the situation, i.e., meta-analysis and research pre-registration, are becoming introduced into the research practice. In this paper, reproducibility issues in the field of psychological science in Japan are discussed from the perspective of research integrity. Reproducibility itself is not always important in psychological sciences, but the emergence of such issues provides a good opportunity for the scientific community to deal with this problem with research integrity.
Rather poor reliability in reproducibility, i.e., less than 40 or 30% success in replication, has been revealed not only in psychology but also in the life and medical sciences. This has been driven by the strong pressure for privatization and commercialization among the academic fields. However, maintaining high standards in reproducibility is not the only approach to realize actual scientific method. Several examples have suggested the effectiveness of a pluralistic set of methods in ethology, cultural anthropology, cognitive psychology, primatology, etc. These include qualitative, rather than quantitative approaches, i.e., historical narrative, collecting anecdotes, and anthropomorphism. Methods should be subordinate to the goal of scientific researchers, and not the other way around.