Data-Driven Clinical Pharmacy Research: Utilizing Machine Learning and Medical Big Data

Shungo Imai

doi:10.1248/bpb.b24-00492

Abstract

To conduct clinical pharmacy research, we often face the limitations of conventional statistical methods and single-center observational study. To overcome these issues, we have conducted data-driven research using machine learning methods and medical big data. Decision tree analysis, one of the typical machine learning methods, has a flowchart-like structure that allows users to easily and quantitatively evaluate the occurrence percentage of events due to the combination of multiple factors by answering related questions with Yes or No. Using this feature, we first developed a risk prediction model for acute kidney injury caused by vancomycin, a condition we frequently encounter in clinical practice. Additionally, by replacing the prediction target from a binary variable (i.e., presence or absence of adverse drug reactions) to a continuous variable (i.e., drug dosage), we built a model to estimate the initial dose of vancomycin required to reach the optimal blood level recommended by guidelines. We found its accuracy to be better than that of conventional dose-setting algorithms. Moreover, employing Japanese medical big data such as the claims database helped us overcome the major limitations of conventional clinical pharmacy research such as institutional bias caused by single-center studies. We demonstrated that the combined use of machine learning and medical big data could generate high-quality evidence leveraging the strengths of each approach. Data-driven clinical pharmacy research using machine learning and medical big data has enabled researchers to surpass the limitations of conventional research and produce clinically valuable findings.

INTRODUCTION

In conducting clinical pharmacy research from the pharmacist’s perspective, we often face limitations of conventional statistical methods and single-center observational studies. These limitations have repeated hindered our efforts to improve patient outcomes. To overcome this problem, we have focused on data-driven research using machine learning methods and medical big data. In this review, we discuss an analytical approach using the decision tree model,^1,2) one of the typical machine learning methods, and the analysis of medical big data from a clinical pharmacy perspective.

1. EXPLORING ANALYTIC APPROACHES FOR CLINICAL PHARMACY RESEARCH BY USING MACHINE LEARNING METHODS

We encountered many cases of adverse drug reactions (ADRs) in patients using our prescription recommendations and dosing settings. To establish a methodology for avoiding such ADRs, authors developed a risk-prediction model to identify high-risk cases. For model construction approach, we focused on the decision tree model.^1,2) In the past, risk factors for ADRs have often been analyzed using multivariate analysis, such as the logistic regression model. Although this method can assess the contribution of risk factors to outcomes as an odds ratio, it is often difficult to interpret how to manage patients with multiple risks and to evaluate results presented as odds ratios as incidence proportions of events. The decision tree model has a flowchart-like structure that allows users to easily and quantitatively evaluate the occurrence percentage of events due to the combination of multiple factors by answering related questions with Yes or No. This model has been used in fields such as marketing to predict specific customers, for example, their churn risk (i.e., likelihood of leaving). We hypothesized that by replacing churn risk with ADR risk, we could construct a model usable by medical professionals in clinical practice. Hence, authors first developed a risk prediction model for acute kidney injury caused by vancomycin (an anti-methicillin-resistant Staphylococcus aureus agent), which we frequently encounter in clinical practice.^3,4) The constructed model was clinically valid, and its accuracy was favorable. Authors further advanced this approach by conducting multicenter studies and increasing the number of targeted drugs.^5,6) The risk prediction model for acute kidney injury at the time of initiating vancomycin administration developed through a multicenter study, is shown in Fig. 1.⁶⁾ This simple flowchart allows users to easily estimate the risks of side effects, such as determining the patients with concomitant use of vasopressors have a high risk of developing acute kidney injury. Then, by replacing the prediction target from a binary variable (i.e., presence or absence of ADRs) to a continuous variable (i.e., drug dosage), we considered that enables to construct a novel and highly accurate drug dose-setting algorithm. Authors built a model to estimate the initial dose of vancomycin needed to reach the optimal blood level recommended by the guidelines and found its accuracy to be better than that of conventional dose-setting algorithms.^7,8) As shown in Fig. 2, by following the flowchart, medical professionals can estimate the initial daily dose of vancomycin considering multiple factors. We expect that our model will complement existing dose setting algorithm of therapeutic drug monitoring guideline.⁹⁾ Furthermore, we have attempted to apply various machine learning methods, such as neural networks, to clinical pharmacy research.¹⁰⁾ These methods are widely applicable to various drugs and are expected to be further developed.

Fig. 1. The Decision Tree Model for the Prediction of Vancomycin-Associated Acute Kidney Injury

A flowchart was proposed for predicting the risk of VCM-induced AKI at the time of administering the standard dose when initiating VCM therapy. Target patients are “aged 18 years and over” and “started standard dose of VCM.” Subgroups of the flowchart were categorized by the proportion of VCM-induced AKI as follows: (1) low-risk group (<10%), (2) intermediate-risk group (10–25%), and (3) high-risk group (>25%). Abbreviations: AKI, acute kidney injury; VCM, vancomycin; TAZ/PIPC, tazobactam/piperacillin, BMI: body mass index. (Cited ref. 6).

Fig. 2. Algorithm for Initial Dose Settings of Vancomycin on the Basis of the Decision Tree Model for Estimating Target the Area under the Concentration Curve of 400–600 mg·h/L

Using CART algorithm, the training data (n = 661) were branched subgroups. The predictive factors of the DT model were selected as three independent variables (eGFR, age, and BMI). Each box shows the number of cases and the corrected daily VCM dose (average value). DT: decision tree, CART: classification and regression tree, eGFR: estimated glomerular filtration rate, BMI: body mass index (Cited ref. 8).

2. OVERCOMING LIMITATIONS BY USING JAPANESE MEDICAL BIG DATA

Many previous clinical pharmacy studies have used data from medical institutions where the investigators worked. However, the limitations of single-center studies (i.e., using patient data from the investigator’s own institution), such as institutional bias and difficulty of collecting enough sample size, have been a major issue. To address this, we utilized Japanese medical big data.^11–23) In our pharmacy practice, we often encounter patients with chronic kidney disease (CKD) who were prescribed non-steroidal anti-inflammatory drugs (NSAIDs) by other clinical departments or hospitals that diagnosed CKD. In general, NSAIDs may aggravate CKD and should be avoided. Based on this experience, we hypothesized that visits to multiple departments and hospitals contribute to high-risk prescriptions. Using the health insurance claims database, we found that visits to multiple medical institutions triggered high-risk prescriptions, such as NSAIDs for patients with CKD, as well as the combination of renin-angiotensin system inhibitors, diuretics, and NSAIDs (i.e., Triple Whammy).^14,15) Additionally, we focused periodic blood tests required after initiating medications as indicated by the package insert and found that these tests are insufficiently conducted in Japan.^12,18,23) For example, liver function tests after benzbromarone initiation are often neglected¹²⁾ (Fig. 3). Typically, it is difficult to explore prescription issues because of the bias in the prescription trends of specific physicians or clinical departments within a single medical institution. By combining the strengths of medical big data with the perspective of clinical pharmacists, we obtained important findings regarding patient safety.

Fig. 3. Proportions of Patients Who Underwent Liver Function Testing.

A periodic liver function test was defined as one or more liver function tests performed on days 1–90 and 91–180. Days 1–90, 91–180, and 1–180 reflected the proportion of patients in whom liver function tests were implemented at least once in each period after the start of benzbromarone (Cited ref. 12).

Next, authors focused on the Japanese Adverse Drug Event Report (JADER) database, built by the Pharmaceuticals and Medical Devices Agency, and explored novel approaches to its usage. We discovered that the JADER database can detect vaccine adverse events earlier than package insert revisions.²²⁾ This is the first study to demonstrate that JADER can be used for safety monitoring of vaccines in Japan, which lacks a vaccine-specific safety monitoring system such as the Vaccine Adverse Event Reporting System in the United States. Additionally, by focusing on immune-related adverse events associated with immune checkpoint inhibitors (ICIs), we found that the risk of Eaton–Lambert syndrome, a rare side effect, may be higher in patients with small-cell lung cancer.²⁰⁾ This study successfully verified the hypothesis that ICI can elicit autoimmune complications in patients with small-cell lung cancer. Thus, this research demonstrated the potential value of utilizing JADER for hypothesis validation related to rare ADRs that are difficult to evaluate using other data sources, such as claims databases. By understanding the advantages and disadvantages of various types of medical big data and selecting them appropriately based on each research question, clinically valuable findings can be obtained.

3. DEVELOPMENTS IN CLINICAL PHARMACY RESEARCH THROUGH THE COMBINATION OF APPROACHES OF MACHINE LEARNING AND MEDICAL BIG DATA

We hypothesized that the combined use of machine learning and medical big data would create high-quality evidence by leveraging the strengths of each approach. To test this, we focused on daptomycin-induced musculoskeletal toxicity and constructed a predictive model for this ADR. Although daptomycin-induced musculoskeletal toxicity is a significant clinical problem, it is usually difficult to obtain sufficient numbers of cases for analysis at a single medical institution. We recruited the largest number of patients using a large database of Japanese electronic medical records. By applying this database to a decision tree analysis, we found that patients with concomitant use of hydrophobic statins and high baseline creatine phosphokinase values were at high risk for daptomycin-induced musculoskeletal toxicity²⁴⁾ (Fig. 4). Furthermore, by extending this approach to other ADRs, we demonstrated that it is possible to generate findings that overcome the limitations of both conventional statistical methods and single-center studies.^25,26) In the future, further development is expected through the practice of reverse translational research, where findings obtained in data-driven clinical pharmacy research are validated in basic research.

Fig. 4. Decision Tree Model That Assessed the Risk of CPK Elevation Based on a Combination of Risk Factors during Daptomycin Therapy

The decision tree model predicted CPK elevation more than twice from baseline and >200 IU/L. Abbreviation: CPK, creatine phosphokinase. Baseline CPK values >82.0 IU/L and >115.0 IU/L have an upper limit value of 200 IU/L because patients with baseline CPK value >200 IU/L were excluded in this study (Cited and modified ref. 24).

4. RESEARCH SUPPORT FOR PHARMACISTS/PHYSICIANS USING APPROACHES MACHINE LEARNING AND MEDICAL BIG DATA

This data-driven research approach can be applied to various clinical pharmacy researches.^27–33) In this review, we discuss two cases. The first was performed in collaboration with a pharmacist specializing in palliative care. Using a decision tree analysis, we found that young women were at a high risk of oxycodone-induced nausea.²⁷⁾ In the second case, by utilizing big medical data to ensure a large sample size, we revealed that contraindicated prescriptions of olanzapine for diabetic patients were often prescribed not for schizophrenia but for antiemetic purposes in cancer chemotherapy.²⁸⁾

5. LIMITATIONS OF APPROACHES OF MACHINE LEARNING AND MEDICAL BIG DATA

Although data-driven approaches are useful, several issues should be noted in regard to conducting the research. Overfitting is a problem in machine learning models.³⁴⁾ This is a phenomenon in which the model overfits the training data, resulting in significantly lower extrapolation. Therefore, after building a model, it is important to evaluate its validity using external validation data. To evaluate the model, it is useful to employ several performance metrics, such as the area under the receiver operating characteristic curve (AUC-ROC), sensitivity (a fraction of positive samples actually retrieved), specificity (a fraction of negative samples actually classified as negative), positive predictive value (a fraction of positively classified samples that are indeed positive), and negative predictive value (a fraction of negatively classified samples that are indeed negative).³⁵⁾ With regard to medical big data analyses, it should be noted that the source is insurance claims data. For example, the accuracy of the diagnostic names has not been verified in many cases. This is because medical professionals assign names of diseases that are not true diseases for the insurance claims. In fact, in the aforementioned benzbromarone study,¹²⁾ a limitation remains that the disease name of liver impairment has not been validated. Therefore, we could not perform a safety analysis in this study. In addition, the database contains only prescribing data, i.e., actual use cannot be evaluated. Researchers conducting the study and medical professionals utilizing their findings should accurately understand these limitations.

6. CONCLUSION AND FUTURE PERSPECTIVES

We demonstrated that data-driven clinical pharmacy research using machine learning and medical big data enables us to overcome the limitations of conventional research and create clinically valuable findings. With the construction of a nationwide healthcare information platform and the spread of easy-to-use machine learning software, data-driven research is expected to become even more common. This indicates that an era is emerging where researchers and medical professionals, such as pharmacists and physicians, will conduct data-driven research independently. However, appropriate implementation of research requires literacy in handling and interpreting data. Especially in its infancy, training by a skilled person is essential. Therefore, we need to train personnel who can leverage both clinical experience and data analysis to perform research projects effectively.

Acknowledgments

I thank Prof. Satoko Hori (Keio University), Prof. Mitsuru Sugawara (Hokkaido University), and Prof. Ken Iseki (Hokkaido University). I thank all the members of the Division of Drug Informatics, Keio University Faculty of Pharmacy, Laboratory of Pharmacokinetics, Faculty of Pharmaceutical Sciences, Hokkaido University, Laboratory of Clinical Pharmaceutics & Therapeutics, Faculty of Pharmaceutical Sciences, Hokkaido University, and Department of Pharmacy, Hokkaido University Hospital. I thank all the collaborators in our study. The studies in this review were supported in part by Grants from the Japanese Society for the Promotion of Science (JSPS) for Early-Career Scientists (KAKENHI: 20K16035), Research Activity Start-up (KAKENHI: 19K23791), and Encouragement of Scientists (KAKENHI: 18H00430). The studies in this review were also supported in part by the Naomi Hoshino Memorial Grant for Pharmaceutical Initiatives, 2019 and The Research Foundation for Pharmaceutical Sciences, 2021.

Conflict of Interest

The author declares no conflict of interest.

Notes

This review of the author’s work was written by the author upon receiving the 2024 Pharmaceutical Society of Japan Award for Young Scientists.

REFERENCES

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）