Japanese Journal of Biometrics
Online ISSN : 2185-6494
Print ISSN : 0918-4430
ISSN-L : 0918-4430
Current issue
Displaying 1-5 of 5 articles from this issue
Special Section : Statistical Analysis for Secondary Use of Medical Data
Review
  • Shintaro Hiro
    Article type: Review
    2025Volume 46Issue 2 Pages 63-74
    Published: November 30, 2025
    Released on J-STAGE: October 30, 2025
    JOURNAL FREE ACCESS

    This paper outlines the recent expansion of the utilization of medical real-world data (RWD) in the clinical development and post-marketing evaluation of drugs in the United States, European Union, and Japan. We provide an overview of the necessary knowledge for researchers to understand the capabilities and limitations of RWD compared to clinical trials. Recognizing the importance of daily clinical practice is essential for effectively utilizing RWD, and it is crucial to formulate relevant research questions that can be addressed using this data. The paper covers key areas including: ① data sources of RWD, ② pharmacoepidemiology, and ③ pertinent laws for using RWD in Japan. Under the common understanding of the difference between non-interventional studies with RWD and clinical trials, the paper discusses the challenges and future prospects of using medical RWD in Japan and suggests statisticians to prepare enhancing knowledge of daily medical records and handling of log formatted unstructured data for these challenges.

    Download PDF (832K)
  • Tetsuji Ohyama
    Article type: Review
    2025Volume 46Issue 2 Pages 75-100
    Published: November 30, 2025
    Released on J-STAGE: October 30, 2025
    JOURNAL FREE ACCESS

    In recent years, the use of medical information databases has increased. When conducting research using medical information databases, it is necessary to define outcomes, exposures, and confounding factors that align with the research objectives from the information in the database. At this time, validation is required to evaluate the extent to which true cases can be identified. The presence or absence of disease is determined by chart review by multiple raters, and inter-rater reliability is evaluated. As the use of medical information databases increases, opportunities to conduct such reliability studies will increase. Therefore, in this paper, we will review measurement reliability and how to estimate the intraclass correlation coefficient and kappa coefficient, which are used as reliability indicators, while also referring to recent research.

    Download PDF (894K)
  • Kyoji Furukawa, Natsumi Kumano, Yurika Kawazoe, Akiyoshi Nakakura
    Article type: Review
    2025Volume 46Issue 2 Pages 101-134
    Published: November 30, 2025
    Released on J-STAGE: October 30, 2025
    JOURNAL FREE ACCESS

    While collecting a complete dataset with no missing or inaccurate measurements is ideal, it is very rare due to a number of reasons. Incompleteness in data can introduce bias and/or information loss in estimating the relationship between the factors of interest and the outcome, potentially reducing the quality and validity of research findings. In epidemiological observational studies, these sources of bias have been increasingly widespread as the use of real-world data grows, which are not collected as planned, such as insurance claims databases and electronic medical records. This will increase the importance of controlling for the bias sources in statistical analyses. This article focuses on two important issues of incompleteness in data analysis: missing data and measurement error, and discusses statistical approaches to address them.

    Download PDF (1125K)
  • Yasunobu Nohara
    Article type: Review
    2025Volume 46Issue 2 Pages 135-151
    Published: November 30, 2025
    Released on J-STAGE: October 30, 2025
    JOURNAL FREE ACCESS

    In recent years, artificial intelligence (AI) has become deeply embedded in our daily lives, with machine learning—one of its key components—gaining increasing attention. Although machine learning can achieve high predictive accuracy, it has not been widely adopted in data analysis due to the difficulty in interpreting its results. However, this barrier is beginning to break down with the advent of Explainable AI (XAI) technologies. Among the various types of machine learning algorithms, treebased methods—such as decision trees and ensemble trees—are particularly notable.For tabular data commonly found in medical records, ensemble tree methods often outperform other approaches in both accuracy and scalability. This paper focuses on building machine learning models using ensemble trees and interpreting them with SHAP (Shapley Additive Explanations), a widely used XAI technique. Ensemble tree models can be constructed almost automatically once the data is properly prepared.By using SHAP summary plots and dependence plots, we can gain insights into the overall structure of the data without requiring domain-specific expertise. Although these results may include confounding factors, this approach can still be valuable for uncovering potential medical knowledge.

    Download PDF (806K)
feedback
Top