Translational and Regulatory Sciences
Online ISSN : 2434-4974
Short Communication
Characterization of SOAP data on medication history in Japan using text mining
Shimako TANAKAMaiki SAKAMOTOJun YAMATOShun KUMAGAIMasaki KOGAWASatoshi MIYATATakao YAMORIEriko NAKATANITakashi OKURA
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2024 Volume 6 Issue 3 Pages 60-67

Details
Abstract

Textual information on medication history, specifically the SOAP data created by pharmacists, accumulates subjective (S), objective (O), pharmacological assessment (A), and planning (P) data for patients with various diseases. Subjective data are crucial for evaluating drug safety, and SOAP data may be helpful for this evaluation; however, to the best of our knowledge, the characteristics of these data have not been studied systematically. After obtaining approval from the Teikyo University Medical Research Ethics Committee (approval number: 21-138-3), we analyzed 86,809 SOAP entries from 7,472 patients who visited Sundrug Co. Pharmacies and used the KH Coder (Higuchi, official package) to identify the characteristics, commonalities, and relevance of SOAP data on medication history by text mining. Morphological analysis and a co-occurrence network diagram of the frequently occurring words extracted revealed distinct connections and patterns across SOAP constituents: S data described patient symptoms and behavior, forming a network centered on symptoms and the physical condition of the patients; O data focused on prescriptions, drug names, and dosages, forming a network between these items and persons involved in administering the medication; A data indicated drug effects and side effects, forming a network with connections to pharmacists’ actions and evaluations; and P data formed a network between words describing patient guidance, side effect monitoring, and collaboration with other professionals. This analysis revealed the unique characteristics and commonalities among the SOAP data components.

Highlights

Textual information on medication history, specifically the SOAP data created by pharmacists, accumulates subjective (S), objective (O), pharmacological assessment (A), and planning (P) data for patients with various diseases. Subjective data are crucial for evaluating drug safety, and SOAP data may be helpful for this evaluation. We analyzed 86,809 SOAP entries from 7,472 patients who visited Sundrug Co. Pharmacies and used the KH Coder to identify the characteristics, commonalities, and relevance of SOAP data on medication history using text mining. This analysis revealed the unique characteristics and commonalities among the SOAP data components.

Introduction

Drug safety evaluation methods using real-world data (RWD), which are collected based on medical practices in clinical settings, have attracted increasing attention. RWD is expected to be utilized to evaluate the efficacy and safety of drugs and conduct efficient clinical trials because it contains a wealth of clinical data, such as patient symptoms and treatment details, and guidance is being established in Japan, the U.S., and Europe [1,2,3,4,5].

The FDA Adverse Event Reporting System (FAERS) (FDA) and JADER (PMDA), which are spontaneous reporting systems, and MID-NET (PMDA), a medical database based on electronic medical records and receipt information, exist as basic RWD-derived databases for adverse event evaluation used in the postmarketing surveillance of pharmaceutical products [6,7,8]. However, FAERS and JADER mainly report serious adverse drug reactions, and there are various biases, including duplication of information due to the reporting of the same cases by physicians and pharmaceutical companies and the subjectivity of the reporter. In addition, MID-NET mainly targets hospitalized patients and those in the acute phase of treatment but does not cover patients with chronic diseases, and there is little data on subjective side effects.

In drug safety evaluation, patient subjective data are critical in adverse event reporting, and recently, the importance of subjective assessment by patients themselves, that is, patient-reported outcomes (PROs), has been recognized in addition to outcome evaluation by healthcare providers [9]. PROs allow for the assessment of subjective patient data. PROs are considered highly valuable in investigating health conditions and symptoms where the gap between patient and physician perceptions is large. More recently, PROs have been actively utilized when direct patient assessment provides an equal or more meaningful assessment [10]. PROs have been measured using the MOS 36-item short-form health survey [11] and EuroQol EQ-5D [12] as comprehensive scales; visual analog scale scores [13] for pain have been used as disease- and symptom-specific scales; and other indices, such as the Western Ontario and McMaster Universities Osteoarthritis Index [14], have been used.

Reports on text mining methods for evaluating textual information in medical data, especially electronic medical records, have been published. Laar et al. [15] extracted data from the electronic medical records of a 122-patient population and used text-mining tools to evaluate nivolumab and pembrolizumab (both immune checkpoint inhibitors) and the combination of dabrafenib plus trametinib to retrospectively examine the benefits and risks of postoperative adjuvant therapy for melanoma. Limsomwong et al. [16] developed and evaluated an algorithm to identify cancer patients who may benefit from palliative care using text-mining data from 14,363 cancer patients aged 18 years and older. Goh et al. [17] used electronic medical record information from 43,216 hospital admissions to extract psychosocial factors and predict readmission risk. Shiba et al. [18] used text mining of nursing records to identify the background factors related to postoperative pain. Although there has been an increase in the number of reports on text mining of medical data, owing to the characteristics of the target RWD database, the information contained in the text is often from the perspective of healthcare professionals and contains words related to surgical procedures and procedure names. There are still few evaluations based on PROs. Therefore, a database that covers the entire patient population and accumulates both subjective and objective patient information is necessary as a basis for evaluating drug safety to complement PROs. Here, we focused on “medication history” as the essential information for assessing drug safety on the basis of PROs.

Medication history is a type of RWD, a medical record created by a pharmacist at each patient visit that includes information on drug preparation, drug instructions, basic patient information, symptoms, and diseases [19]. Medication history is a database of information on (1) various diseases, (2) the patient’s condition, and (3) the patient’s medication use. Drug administration history has both generality and sufficiency that are not found in other medical data, such as: (1) covering a wide variety of diseases, age groups, and the majority of patients, including those at home and in aged care facilities; (2) including subjective data of patients obtained through dialogue, allowing assessment of physical condition and side effects; (3) recording drug administration status and lifestyle habits from a pharmacological perspective; and (4) allowing continuous data collection from a single patient [3].

One of the critical features of medication history is the recording of patient information in SOAP format, which mainly consists of information obtained from prescriptions and dialogue with the patient, and is divided into the following four categories: subjective (S), objective (O), assessment (A), and planning (P) data. The S data describe the symptoms, opinions, and impressions directly reported by the patient or family members on behalf of the patient. In contrast, the O data describe objective information, including changes observed by the healthcare provider, symptoms, behavior, test values, and measurements. The A data enables assessment of the meaning of the data described in the S and O data from the perspective of the patient’s problems, enables judgments to be made, and supports consideration of what direction to take in the future. The P data describes the specific steps to be taken in the direction considered in the A data. The P data may be further divided into an Education Plan (including information provision and guidance), a Care Plan (questioning inquiries and changes in dispensing methods), and an Observational Plan (actions to be taken by the pharmacist on the patient’s next visit). When creating SOAP notes, pharmacists generate a large amount of textual data daily to record the obtained information and apply it to drug therapy. Thus, in addition to primary patient and prescription information, medication history is a unique source of information that combines the patient’s subjective information, objective data focused on the drug, assessment, and planning from the pharmacist’s perspective, and multiple perspectives of the patient and healthcare professional.

Previously, we focused on medication history and tested whether prescription information in medication history could be used to assess the risk of adverse events caused by drugs in light of known events [20]. We found that medication history could be used to conduct a drug safety assessment of the increased risk of developing dementia owing to anticholinergic drugs, as in a large-scale clinical trial, indicating the potential use of medication history as a database for drug safety assessment. However, the data items used in the analysis were limited to patient prescription information, and to our knowledge, there are no reports evaluating the characteristics of SOAP items that contain an enormous amount of textual information. Therefore, we aimed to evaluate the SOAP items using medication history data used in the risk analysis of anticholinergic drugs for dementia development [20].

In the present study, we partnered with an insurance company pharmacy group to collect text-mine electronic medication history data and analyzed the characteristics, commonalities, and relevance of SOAP data in medication histories to determine the potential of medication history as primary information for assessing new drug safety.

Materials and Methods

Data source and study design

After obtaining approval from the Teikyo University Ethical Review Board for Medical and Health Research Involving Human Subjects (approval no. 21-138-3), we conducted a retrospective study using data from the medication history database of patients who visited 15 branches of Sundrug Co. Pharmacies, a Japanese insurance company pharmacy group. The database contains information on all drugs prescribed during the study period (generic name, date of dispensing, amount dispensed, number of tablets or total amount of liquid formulations, specifications for tablets and concentrations for liquid formulations, and route of administration), existing medications, physical condition changes, concomitant medications, medical history, visits to other departments, side effects, preferences, and generics. The data included the SOAP items as the main points of drug administration guidance, including the intention and existence of a medication registry.

The data were used following the Japanese Personal Information Protection Law and the Ethical Guidelines for Medical and Health Research Involving Human Subjects.

Study participants

The study period of the medication history database was from January 1, 2013, to March 31, 2022. The medications administered to the patients were approved for manufacturing and marketing in Japan. Overall, the participants took eight drugs for overactive bladder (OAB; solifenacin, oxybutynin, imidafenacin, tolterodine, fesoterodine, propiverine, mirabegron, and vibegron) and four drugs for dementia (donepezil hydrochloride, galantamine hydrobromide, rivastigmine, and memantine hydrochloride). The medication history of patients taking either brand name or generic drugs (98 drugs for OAB and 361 drugs for dementia) was included.

Analytical methods

Patients’ SOAP data were extracted from their medication histories. The analysis was conducted using KH Coder (Higuchi, version 3.01, official package) for text mining, and the Monkin Negative Expression Checker for KH Coder was used as a plug-in. Japanese words are not separated by white spaces, and word identification is essential for text mining. The morphological analysis tool used for word identification, conjugation processing, and parts-of-speech identification was Mecab, which is an open-source morphological analysis engine (a joint research unit project of the Graduate School of Informatics, Kyoto University, and the Nippon Telegraph and Telephone Corporation Basic Research Laboratories for Communication Science). The dictionary tools for retrieving words from the textual information of each SOAP item were the “Manbyo Dictionary Data”, “Hyakuyaku Dictionary Data”, and “ComeJisyo” (from the Nara Institute of Science and Technology), which were compiled to classify compound words and technical terms [21, 22].

Morphological analysis using Mecab was used to break down each entry in the SOAP database into morphemes, identify parts of speech, and extract words, excluding particles and auxiliary verbs. In addition, blank data in the SOAP item was omitted, and “OD tablet” and “Reiwa era” were forcibly extracted from the O data only, followed by data cleansing. Furthermore, a network diagram (co-occurrence network diagram) was created based on the similarity of connections and appearance patterns between the words in the sentences of each SOAP item. The size of the circle surrounding each word indicates its frequency of use, the line connecting the two circles indicates the tendency for the two words to appear in the same sentence, and the thickness of the line indicates the strength of the similarity between the words. In addition, a correspondence analysis chart was created to visualize the characteristics and relationships between each SOAP item. Each analysis was performed on Japanese text, and English translations are attached to the results.

Results and Discussion

This study aimed to analyze the characteristics, commonalities, and associations of SOAP data, including medication history, by collecting and text-mining electronic medication history data from insurance company pharmacies.

Data of patients prescribed OAB and dementia drugs were extracted from the medication history data of 15 insurance pharmacies in the Sundrug Co. group, and a total of 89,229 SOAP data components were obtained from the medication history of 7,472 patients [median age 75 years (interquartile range: 65–80), 3,336 males, and 4,136 females].

Up to 20 of the most frequently occurring words found in the SOAP data are listed for each item (Table 1). Next, a co-occurrence network diagram was created based on the frequently occurring words. The co-occurrence network is a method that allows us to identify words that appear frequently in medication history and search for themes in the data and the intent of the person who wrote the information based on the connections between words [21, 22].

Table 1.Top 20 most frequent words in SOAP data.

S information O information A information P information
Words Frequency Words Frequency Words Frequency Words Frequency
Medicine 32,945 Prescription 29,304 Necessary 32,673 Confirmation 75,891
Take medicine 22,812 Maintain 26,617 Maintain 29,405 Guidance 75,558
Confirmation 22,394 OD tablet 24,254 Take medicine 24,668 Take medicine 57,501
Take 18,202 Change 23,093 Caution 19,825 Symptoms 40,681
Blood pressure 14,956 Delete 16,899 Confirmation 17,255 Next time 38,107
Symptoms 12,276 New 14,581 Side Effects 12,834 Caution 25,039
Toilet 12,248 Dosing 13,523 Symptoms 11,280 Follow-up 24,905
Particularly 11,643 Tablets 11,793 Prescription 10,641 Consultation 19,632
Condition 9,574 Generic name 11,064 Problem (deny) 9,729 Doctor 18,790
This time 9,393 Add 9,977 Medication 8,637 Maintain 18,738
Say 9,378 Donepezil hydrochloride 9,540 Guidance 8,357 Side Effects 17,438
Change (deny) 8,268 Confirmation 9,090 Condition 6,105 Medication 15,662
Number of times 8,231 Month 9,052 Decision 5,799 Blood pressure 14,878
Problem (deny) 7,590 Patient 7,931 Explanation 5,588 Explanation 13,334
A little 7,144 Mirabegron 6,750 Blood pressure 5,582 Dizziness 13,111
Family 6,455 Memantine hydrochloride 5,045 Good 5,455 Take 12,951
Think 6,395 Amlodipine Besilate 4,505 Possible 5,293 In case 12,651
Calm down 5,786 Alternative 4,397 Compliance 5,082 Continuously 11,947
Doctor 5,663 Medical use 4,293 Use 4,771 Become symptomatic 11,665
Dizziness 5,143 Blood pressure 3,851 Consider 4,753 Milligrams 10,914

The S data included words describing drug administration and the patient’s physical condition, symptoms, and behaviors. Many of the extracted words described patients’ complaints and actions. Among the 20 most frequently occurring words, more than half described the patient’s physical condition, symptoms, and actions, indicating that the S data constituted much of the patient’s subjective data (Fig. 1).

Fig. 1.

Co-occurrence network of S data with words extracted using KH Coder.

The O data included many words that described changes from a previous prescription, such as “maintain”, “change”, “delete”, and “new”, as well as “donepezil hydrochloride”, “memantine hydrochloride”, and “mirabegron”, which are the names of drugs for dementia and OAB, “OD tablet”, which is a formulation, and “patient”, which is a description of medication administration. The co-occurrence network diagram of O data was specialized in prescription drug names and dosages, forming a network diagram related to the prescription content group and a network diagram related to persons involved in drug administration and medication administration (Fig. 2). These objective data were mainly obtained from prescriptions. Most information was related to prescriptions and drugs, indicating that the O data were mostly specific to drug therapy.

Fig. 2.

Co-occurrence network of O data with words extracted using KH Coder.

The A data included the descriptive words “good” and “compliance” regarding the patient’s medication status, and “necessary”, “maintain”, “side effects”, and “decision”, which indicate the assessment by the pharmacist. A co-occurrence network diagram of A data related to pharmacological evaluation by the pharmacist (dementia, OAB, evaluation, medication management, and adverse drug reaction groups) and a network diagram regarding the pharmacist’s actions (guidance group) was formed (Fig. 3). The emergence of multiple groups (conditions, drug effects, and side effects) for pharmacological assessment indicated that pharmacists analyzed the S and O data, conducted assessments from multiple perspectives, made decisions, and considered future directions.

Fig. 3.

Co-occurrence network of A data with words extracted using KH Coder.

P data included the descriptive words “guidance” and “caution” indicating pharmacist guidance, “next time”, “follow-up”, and “maintain” indicating ongoing medication planning by the pharmacist, and “consultation” and “doctor” indicating collaboration of the pharmacist with other healthcare professionals (Fig. 4). The Ministry of Health, Labour and Welfare Ordinance stipulates the points to be noted in continuous pharmacological management and guidance as items noted in medication history [15]. Therefore, the top descriptive terms indicating guidance from pharmacists and their collaboration with other healthcare professionals appeared at the top of the list.

Fig. 4.

Co-occurrence network of P data with words extracted using KH Coder.

Correspondence analysis was subsequently conducted to compare each SOAP item and examine their commonalities and characteristics (Fig. 5). A correspondence analysis visually displays the relationships between items by illustrating the characteristics of the data by placing data with weak characteristics near the origin and those with strong characteristics far from the origin.

Fig. 5.

Correspondence analysis: characteristics of the SOAP data.

In the S data, words related to the patient’s main complaint, such as “think” and “change”, and words related to OAB, such as “toilet” were placed away from the origin. By contrast, in the O data, words related to the prescription contents, such as “prescription”, “change”, and “medication”, were placed away from the origin. In the A data, we found words related to pharmacological evaluation by pharmacists and evaluation of adverse drug reactions, such as “dry mouth”, “bleeding”, “improvement”, and “worsening”, whereas in the P data, we found “guidance”, “consultation”, “next time”, “maintain”, and other words that indicate the pharmacists’ guidance and continuous medication planning and collaboration with other healthcare professionals. The A and P data were not well characterized but were highly similar in terms of textual information. This similarity may result from the use of common words and phrases among pharmacists and the close distance between the S and O data may be because the treatment plan (i.e., P data) was written based on the assessment made using the S and O data. Therefore, pharmacists could classify and record the collected data according to each item in their medication history.

The limitations of this study are as follows. First, we used the medication history from a single insurance company’s pharmacy as the dataset, and future integration and comparison of data from multiple companies are warranted. Second, although a medical dictionary was introduced to standardize notation blurring among pharmacists using the medication histories of patients taking drugs for OAB and dementia, it is necessary to replicate the data with various diseases and patient populations. Finally, we did not evaluate the performance of the analytical method, including its ability to detect and discriminate adverse events, which is necessary for drug safety evaluation. Further analysis is needed to assess the performance of the method.

Unlike previous qualitative reports on text mining of medical information, the present study used a large sample size and analyzed a large amount of text data. Furthermore, the introduction of a medical dictionary made it possible to conduct analyses without the need for complex data cleaning. Thus, we used a quantitative research perspective to address the perceived weaknesses of qualitative text mining methods, which are considered arbitrary and unscientific. Because the SOAP data regarding medication history showed similarities with the characteristics of each item, including subjective patient data (S data), objective data specific to drug therapy (O data), pharmacological assessment (A data), and planning (P data), quantitative analysis using medication information covering a wide variety of patients, including those at home or in institutions, in addition to patients who visited a clinic, and qualitative analysis of SOAP data is warranted. In the future, multivariate analysis covering a wide range of patients, including home-based healthcare patients and facility residents, is expected to become a new method for risk avoidance owing to adverse drug reactions by combining quantitative analysis of medication information and qualitative analysis based on text mining of SOAP data.

Conclusion 

In this study, the characteristics of SOAP data regarding medication history were evaluated by text mining. S data included subjective information, such as patient complaints and behaviors; O data included objective information specific to drug therapy, such as prescriptions and medication management; A data included pharmacological evaluations, including drug efficacy and side effects; and P data included guidance and planning by the pharmacists. The SOAP data on medication history can be used to evaluate drug safety using medical information that includes both subjective and objective patient data.

Funding

This work was supported by Sundrug Co., Ltd. The sponsor had no role in the study design; collection, analysis, and interpretation of data; writing of the report; or decision to submit the article for publication.

Conflict of Interest

TO received a research grant from SunDrug Co., Ltd. (Tokyo, Japan). ST, MS, JY, SK, MK, SM, TY, and EN declare no conflict of interest.

Acknowledgments

We are grateful to Prof. Koichi Higuchi for advice regarding this analysis. We thank Robin James Storer, Ph.D., from Edanz (https://jp.edanz.com/ac) for editing the draft of this manuscript.

References
 
© 2024 Catalyst Unit

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top