2022 Volume 27 Issue 2 Pages 49-59
In recent years, the utilization of health care databases has been increasing worldwide. It is expected that Real World Data (RWD) will soon be effectively used for clinical research in Japan. On the other hand, database studies that use accumulated existing data such as electronic medical records, Diagnosis Procedure Combinations (DPCs), and health insurance claims, require extremely high loads of data preprocessing before statistical analysis is possible. So far, there is insufficient literature that describes the challenges of RWD preprocessing from an academic point of view. In this review paper, the challenges of database study are classified into three categories:(1)data content,(2)data structure, and(3)large-volume data handling. We then investigated existing preprocessing research and systematically introduced them. Most data preprocessing research targeted the improvement and reliability of the database itself through supplementing data contents required for each clinical research. There is very little research with the primary purpose of solving problems related to data structures and large-volume data processing. As the use of RWD for clinical research increases, the importance of the data preprocessing field will be recognized. In the future, we expect to see more research focused on RWD, which can enable the growth of clinical researches using RWD.