2021 Volume 85 Issue 9 Pages 1416-1425
With the rapid development of artificial intelligence (AI) and machine learning (ML), as well as the arrival of the big data era, technological innovations have occurred in the field of cardiovascular medicine. First, the diagnosis of cardiovascular diseases (CVDs) is highly dependent on assistive examinations, the interpretation of which is time consuming and often limited by the knowledge level and clinical experience of doctors; however, AI could be used to automatically interpret the images obtained in auxiliary examinations. Second, some of the predictions of the incidence and prognosis of CVDs are limited in clinical practice by the use of traditional prediction models, but there may be occasions when AI-based prediction models perform well by using ML algorithms. Third, AI has been used to assist precise classification of CVDs by integrating a variety of medical data from patients, which helps better characterize the subgroups of heterogeneous diseases. To help clinicians better understand the applications of AI in CVDs, this review summarizes studies relating to AI-based diagnosis, prediction, and classification of CVDs. Finally, we discuss the challenges of applying AI to cardiovascular medicine.
Cardiovascular diseases (CVDs) are the leading cause of death in humans, currently accounting for approximately one-third of all deaths worldwide.1 Although considerable progress has been made in the management of CVDs in recent years, many challenges remain in clinical practice. First, the diagnosis of CVDs is highly dependent on electrocardiograms (ECGs) and/or cardiovascular imaging, the interpretation of which is time consuming and requires experience.2 Second, most currently used prediction models of CVDs are based on traditional statistical methods, limiting prediction performance.3 Third, the “one-size-fits-all” management concept in the clinic is not helpful for prognosis because it ignores the heterogeneity of CVDs.4
Artificial intelligence (AI) is regarded as a revolutionary frontier technology, and it has become a global research interest in the field of medicine, especially in cardiovascular medicine. AI could help clinicians overcome the aforementioned 3 challenges by automatically interpreting ECGs and/or cardiovascular imaging results, building more powerful prediction models, and characterizing subgroups of CVDs.
In this review we provide an overview of the applications of machine learning (ML)-based AI in the diagnosis, prediction, and classification of CVDs. The objective of this review is to help practicing clinicians better understand 3 key questions: (1) what are AI, ML, and algorithms; (2) what tasks can ML-based AI perform in cardiovascular clinical practice; and (3) what is the general workflow to conduct a study associated with AI-based diagnosis, prediction, or classification in CVDs?
AI is a field of computer science that aims to mimic human thought processes, learning capacity, and knowledge storage.5 ML, one of the methods to realize AI, usually refers to the process by which a system obtains information from data through algorithms. ML can be roughly divided into supervised and unsupervised learning. There are 2 main differences between supervised and unsupervised learning. First, supervised learning uses data that have been tagged with 1 or more labels, like properties, characteristics, or classifications, whereas unsupervised learning uses data that have not been tagged.5 Second, supervised learning is focused on classification, which involves classifying an observation into several subsets (e.g., classifying an ECG into atrial fibrillation (AF), sinus rhythm, or other), and prediction, which involves estimating an unknown variable (e.g., predicting whether a patient will die in 5 years). In contrast, unsupervised learning is focused on discovering underlining patterns and relationships among the unlabeled dataset.3 A typical representative example of unsupervised learning is clustering analysis, which involves subgrouping objects based on their similarity. ML is based on algorithms, such as decision trees, random forests (RF), support vector machines (SVM), neural networks (NN), and deep learning (DL). These algorithms have been reviewed in detail elsewhere.6
To clearly illustrate the role of ML-based AI in CVDs, in this review we summarize the applications of AI from 3 aspects: diagnosis, prediction, and prognosis.
The diagnosis of most CVDs is highly dependent on ECG and/or cardiovascular imaging examinations. However, the interpretation of medical images is time consuming and inter-rater variations may not be negligible. With the application of AI, image interpretation could be automated, saving clinicians much time, improving the detection rate, and reducing the rate of misdiagnosis and missed diagnosis. In this section, due to space limitations, we only focus on ECGs (Table 1).
Study | Objectives | Sample size | Algorithm/ software |
ML model performance |
---|---|---|---|---|
Cai et al8 | Detect AF | 16,557 12-lead ECGs | NN | Sensitivity 0.9919, specificity 0.9944, accuracy 0.9935 |
Attia et al9 | Detect AF under sinus rhythm | 649,931 12-lead ECGs | CNN | AUC 0.87, sensitivity 0.79, specificity 0.795, accuracy 0.794 |
Wasserlauf et al10 | Detect AF based on ECGs using smartwatch |
7,500 patients | CNN | Sensitivity 0.975 |
Tison et al11 | Detect AF based on ECGs using smartwatch |
9,750 patients | DNN | Sensitivity 0.980, specificity 0.902 |
Hannun et al12 | Classify cardiac rhythms into 12 rhythm classes |
91,231 single-lead ECGs | DNN | AUC 0.97, F1 0.837 |
Ribeiro et al13 | Recognize 6 types of heart rhythm abnormalities |
2 million 12-lead ECGs | DNN | F1 >0.8, specificity >0.99 |
Kwon et al14 | Detect left ventricular hypertrophy |
21,286 patients with 12-lead ECGs |
DNN+CNN | Internal AUC 0.880, external AUC 0.868 |
Attia et al15 | Detect cardiac contractile dysfunction |
44,959 patients with 12-lead ECGs |
CNN | AUC 0.93, sensitivity 0.863, specificity 0.857, accuracy 0.857 |
Attia et al16 | Detect cardiac contractile dysfunction |
16,056 patients with 12-lead ECGs |
CNN | AUC 0.918 |
Noseworthy et al17 | Detect cardiac contractile dysfunction |
97,829 patients with 12-lead ECGs |
CNN | AUC >0.93 |
Sengupta et al18 | Detect abnormal myocardial relaxation |
188 patients with 12-lead ECGs |
RF | AUC 0.91 |
Ko et al19 | Diagnose HCM | 12-lead ECGs of 3,060 HCM patients and 63,941 controls |
CNN | AUC 0.96, sensitivity 0.87, specificity 0.90 |
Kwon et al20 | Diagnose HF | 55,163 12-lead ECGs of 22,765 patients |
DNN | Internal AUC 0.843, external AUC 0.889 |
Kwon et al21 | Diagnose mitral regurgitation | 70,709 12-lead ECGs of 38,241 patients |
CNN | Internal AUC 0.816, external AUC 0.877 |
Kwon et al22 | Diagnose aortic stenosis | 56,689 12-lead ECGs of 43,051 patients |
MLP+CNN | Internal AUC 0.884, external AUC 0.861 |
Kwon et al23 | Diagnose pulmonary hypertension | 70,709 12-lead ECGs of 38,241 patients |
DNN+CNN | Internal AUC 0.859, external AUC 0.902 |
AF, atrial fibrillation; AUC, area under the curve; CNN, convolutional neural network; CVDs, cardiovascular diseases; DCNN, deep convolutional neural network; DNN, deep neural network; ECGs, electrocardiograms; NN, neural network; HCM, hypertrophic cardiomyopathy; HF, heart failure; MLP, multilayer perceptron; RF, random forest.
As a cheap and non-invasive clinical tool, the ECG plays an important role in the diagnosis of CVDs. Although computer-aided ECG interpretation has been widely used in clinical practice, there is still considerable possibility of misinterpretation.7 In recent years, progress in computer algorithms and the use of big data have significantly improved the accuracy of automatic ECG interpretation. To date, ECGs have been used as inputs to construct ML models to mainly perform 4 tasks: automatic recognition of heart rhythm, detection of cardiac structural abnormalities, detection of cardiac functional abnormalities, and detection of CVDs.
Recognition of Heart RhythmAI has been successfully used to recognize heart rhythms, especially AF. There has been increased interest in detecting AF due to its increasing incidence, as well as the possibility of preventing AF-related strokes. With 16,557 annotated 12-lead ECGs, an NN model was trained to diagnose AF, achieving an overall accuracy >0.99.8 To further explore the ability of AI to detect AF during sinus rhythm, a convolutional neural network (CNN) was trained with 649,931 annotated 12-lead ECGs from 180,922 patients with sinus rhythm, with the ML model able to diagnose AF with an area under the receiver operating characteristic curve (AUC) of 0.87.9 Furthermore, NN models that could detect AF have been inserted in smartwatches, and this technology may benefit thousands of people due to the widespread use of smartwatches.10,11
Importantly, AI could be used to recognize other types of heart rhythms in addition to AF. Using 91,232 single-lead ECGs from 53,549 patients who used a single-lead ambulatory ECG monitoring device, Hannun et al developed a deep neural network (DNN) model that was able to classify 12 cardiac rhythm classes, including 10 arrhythmias, sinus rhythm, and noise.12 The mean F1 score (i.e., the harmonic mean of the positive predictive value and sensitivity) for the DNN exceeded that of average cardiologists (0.837 vs. 0.780).12 Similarly, another DNN model was trained with >2 million labeled 12-lead ECGs, and the model was able to recognize 6 types of heart rhythm abnormalities, with F1 scores >0.80.13
Detection of Cardiac Structural or Functional AbnormalitiesApart from heart rhythm recognition, AI has also been used to detect abnormalities in cardiac structure or function. An ensemble neural network (ENN) model was constructed with ECGs to diagnose left ventricular hypertrophy, with the model significantly outperforming the cardiologist during internal validation.14 This result was confirmed external validation (sensitivity: 0.454 vs. 0.284). The datasets used for the internal validation and ML model training were derived from the same hospital and cohort, whereas the external validation involved an independent dataset derived from another hospital and cohort.14 To detect cardiac contractile dysfunction based on ECG data alone, Attia et al trained a CNN with 44,959 annotated 12-lead ECGs; this model performed well in screening patients with cardiac contractile dysfunction, with an AUC of 0.93.15 Subsequently, the ability of this CNN to detect cardiac contractile dysfunction was validated in a prospective cohort of 16,056 patients in a single cardiovascular center, achieving an AUC of 0.918.16 Furthermore, it was proved that this CNN could be applied to patients of different races and ethnicities.17 Similarly, to detect abnormal myocardial relaxation, Sengupta et al developed an RF model.18 But, unlike the study of Attia et al,15 which directly used raw 12-lead ECGs, the ECGs used in the study of Sengupta et al were processed using continuous wavelet transform mathematics before being used to train the RF model.18 Such image preprocessing actually amplified the ECG signal, and is a useful way to reduce sample size.18
Detection of CVDsIn addition to classifying cardiac rhythms and evaluating cardiac structure and function, AI has been used to directly diagnose CVDs, such as hypertrophic cardiomyopathy,19 heart failure (HF),20 mitral regurgitation,21 aortic stenosis,22 and pulmonary hypertension.23 The outstanding performance of the ML models in diagnosing these diseases based on ECGs indicates that many CVDs may cause subtle abnormalities in ECGs that cannot be easily recognized by human eyes, but can be detected by AI.
WorkflowGenerally speaking, the workflow of an ML-based diagnosis study using ECGs usually involves the collection of ECGs, preprocessing, model construction, and assessment. To enhance the generalization of ML models, it is better to use ECG machines that are widely used in clinics nationwide or worldwide (Figure 1A). ECG preprocessing primarily includes removal of noise and proper representations, such as 1-dimensional signals (suitable for CNN9,10,14–17,19,21–23), important ECG features (suitable for DNN11–14,20,23), and wavelets (suitable for RF;18 Figure 1B). Next, ML models can be constructed with supervised algorithms to perform classification tasks, like classifying the input ECG as AF or non-AF. The most commonly used algorithm is DL, especially CNN (Figure 1C). Finally, model performance is assessed by receiver operating characteristic (ROC) curve analysis and indices like the AUC, accuracy, sensitivity, specificity, and F1 score (Figure 1D).
Workflow for a machine learning (ML)-based diagnostic study using electrocardiograms (ECGs). (A) ECG collection. (B) ECGs preprocessing, which mainly includes noise removal and proper representations, such as (a) 1-dimensional signals, (b) important ECGs feature, and (c) wavelets. (C) Model construction using supervised algorithms, mainly deep learning. (D) Model assessment. AF, atrial fibrillation; AUC, area under the curve; ROC, receiver operating characteristic.
Apart from performing diagnostic tasks, AI performs well in prediction tasks, including predictions of incidence and prognosis (see Table 2).
Diseases, events, or interventions at baseline |
Predicted disease or event |
Time from baseline |
Sample size | Parameters used to construct ML models |
Algorithm | AUC | Reference |
---|---|---|---|---|---|---|---|
Incidence prediction | |||||||
No CVDs | Death, stroke, CHD, CVDs, HF, and AF |
12 years | 6,814 subjects, 66.6% for training (3-fold cross-validation), 33.3% for testing |
20 variables from imaging, non-invasive tests, questionnaires, and biomarker panels |
RF | 0.84 for death, 0.75 for stroke, 0.80 for CHD, 0.80 for CVD, 0.84 for HF, and 0.75 for AF |
24 |
No CVDs | CVDs | 13 years | Training: 3,230 subjects (2-fold cross-validation); internal validation: 3,229 subjects; external validation: 1,348 subjects |
9 variables (age, sex, ethnicity, TC, HDL-C, SBP, treatment for hypertension, diabetes, and smoking) |
SVM | Internal AUC: 0.94; external AUC: 0.95 | 26 |
No VF | VF | 30 s | 27 cases and 28 controls (10-fold cross-validation) | 4 variables from 120-s ECG signals | ANN | 0.99 | 27 |
Prognosis prediction | |||||||
CAD | Death | 5 years | 10,030 patients (10-fold cross-validation) | 19 clinical and 35 CCTA parameters | Logit-Boost | 0.79 | 28 |
HFpEF | Death and hospitalization | 3 years | 1,767 patients (5-fold cross-validation) | 86 clinical, laboratory, and ECG variables |
RF | 0.72 for death, 0.76 for hospitalization | 29 |
Hypertension | Composite end point events |
33 months | 508 young patients with hypertension (10-fold cross-validation) | 11 clinical, laboratory, and echocardiographic variables |
XGBoost | 0.757 | 30 |
OHCA | In-hospital death | – | 39,566 patients; 90% for training, 10% for testing | 46 clinical and laboratory parameters | GBM | 0.87 | 32 |
OHCA | Poor functional outcome | 180 days | 932 patients; 90% for training, 10% for testing (5-fold cross- validation) |
54 clinical and laboratory parameters | ANN | 0.891 | 33 |
CRT | Death | 1, 2, 3, 4, and 5 years |
Training: 1,510 patients (10-fold cross-validation); testing: 158 patients |
33 pre-implant clinical variables | RF | 0.768, 0.793, 0.785, 0.776, 0.803 for 1-, 2-, 3-, 4-, and 5-year mortality prediction, respectively |
34 |
PCI | In-hospital death | – | 11,709 patients with 14,349 PCIs (8-fold cross-validation) | 52 admission variables | RF | 0.92 | 35 |
PCI | HF readmission | 30 days | 11,709 patients with 14,349 PCIs (8-fold cross-validation) | 358 discharge variables | RF | 0.90 | 35 |
PCI | Death | 180 days | 11,709 patients with 14,349 PCIs (8-fold cross-validation) | 358 discharge variables | RF | 0.87 | 35 |
ANN, artificial neural network; CAD, coronary artery diseases; CCTA, coronary computed tomography angiography; CHD, congenital heart disease; CRT, cardiac resynchronization therapy; GBM, gradient boosting machine; HDL-C, high-density lipoprotein cholesterol; HFpEF, heart failure with preserved heart failure; OHCA, out-of-hospital cardiac arrest; PCI, percutaneous coronary intervention; SBP, systolic blood pressure; SVM, support vector machine; TC, total cholesterol; VF, ventricular fibrillation. Other abbreviations as in Table 1.
Most CVDs have a subclinical phase, during which the patients exhibit no clinical symptoms. Importantly, disease progression can be slowed down or even prevented if interventions occur during this phase. Therefore, the prediction of the incidence of CVDs is of great significance for the asymptomatic population.
To explore the ability of ML models to predict the incidence of CVDs, Ambale-Venkatesh et al evaluated 6,814 participants who were initially free of CVDs from the Multi-Ethnic Study of Atherosclerosis (MESA).24 Baseline data for 735 variables, including imaging, non-invasive tests, questionnaires, and biomarkers, were collected for these participants, and the RF technique was used to identify the top 20 variables for each of the 6 outcomes of death, stroke, coronary artery disease (CAD), CVDs, HF, and AF. These 20 variables were then used to construct 6 RF models with 3-fold cross-validation to predict the incidence of each of the 6 outcomes in 12 years. Results showed that the concordance indices, a generalization of the AUC and a useful parameter to evaluate the performance of the predictive model,25 for all these RF models were not less than 0.75.24 To further compare the prediction abilities of ML models and the traditional American College of Cardiology and American Heart Association (ACC/AHA) risk calculator, Kakadiaris et al trained SVM models using the MESA cohort and 2-fold cross-validation.26 The SVM models, constructed with the same 9 traditional risk variables used by the ACC/AHA Risk Calculator, performed better than ACC/AHA Risk Calculator (AUC 0.94 vs. 0.72) and this was verified by using an external validation cohort (AUC 0.95 vs. 0.71).26
In addition to predicting long-term events, ML models can also predict the incidence of cardiovascular events in the short term. For example, an NN model was constructed with 4 QRS complex shape features from baseline 120-s ECGs of 27 ventricular fibrillation cases and 28 controls.27 This model was trained with 10-fold cross-validation and was able to predict the incidence of ventricular fibrillation in 30 s, with an AUC of 0.99. Although 30 s is a short period, it is of importance to save patients’ lives in the clinic.27
Prognosis PredictionPrediction of the prognosis of CVDs is critical in clinical practice because an accurate prognosis prediction model could inform clinicians of each patient’s prognosis, helping with decision making, the use of disease management programs, and in discussing end-of-life preferences.
ML models have been used to predict the prognosis of chronic CVDs, like HF, CAD, and hypertension. The classification algorithm Logit-Boost was used to predict 5-year mortality of patients with CAD.28 Before ML model building, features were selected using information gain ranking, and only those variables helpful in predicting outcomes (information gain >0) were selected for model building. Although data on 25 clinical plus 44 coronary computed tomography angiography (CCTA) parameters were collected, only 19 clinical and 35 CCTA variables were selected for model building. The prediction model was trained and validated using 10-fold cross-validation. This ML model (AUC 0.79) outperformed the Framingham risk score (AUC 0.61) and CCTA severity scores (AUC 0.62–0.64).28
Similarly, an ML model was constructed to predict death and hospitalization in 3 years for patients with HF with preserved ejection fraction (HFpEF) based on data at discharge. The model used 86 clinical, laboratory, and ECG variables from 1,767 patients, and the RF algorithm and 5-fold cross-validation. This model achieved an AUC of 0.72 and 0.76 for death and hospitalization, respectively.29
To achieve accurate prognosis prediction for young patients with hypertension, Wu et al collected 58 variables at baseline and the 33-month follow-up for 508 patients.30 Then, features were selected using recursive feature elimination, with only 11 variables finally selected to build the prediction model. A classifier algorithm called extreme gradient boosting, as well as 10-fold cross-validation, was used to train and validate the ML model. This model did well in predicting composite endpoint events and achieved an AUC of 0.757, higher than the recalibrated Framingham risk score model (AUC 0.529).30
Similarly, prognosis prediction for patients with acute CVDs can be achieved using ML models. Out-of-hospital cardiac arrest (OHCA) is an acute cardiovascular event with over 300,000 cases among adults in the US.31 The prediction of adverse events for patients after OHCA is critical because it could inform clinicians and the patients’ families of the prognosis and then guide intervention. ML models have been used to predict short- and long-term outcomes for this population. To predict the in-hospital mortality of OHCA patients, Nanayakkara et al collected 43 clinical and laboratory parameters from 39,566 OHCA patients within the first 24 h after OHCA.32 Then, 90% of the total data set was used to train the prediction model using a classifier algorithm called gradient boosting machine, after which the model was tested with the remaining 10% of data and showed great ability to predict in-hospital mortality, with an AUC of 0.87.32 To further predict the long-term functional outcome of OHCA patients, a dataset composed of 54 clinical and laboratory parameters at admission and 180-day follow-up records for 932 OHCA patients was collected.33 The outcome prediction model was trained with 90% of the data with 5-fold cross-validation, with the remaining 10% of data used for testing. An NN algorithm was used during model development, and the model performed well in predicting poor functional outcomes, comprising dependence, coma, or vegetative state, and death within 180 days after OHCA (AUC=0.891).33
ML models can also be used to predict prognosis for patients after cardiovascular interventions, such as cardiac resynchronization therapy (CRT) and percutaneous coronary intervention (PCI). To build ML models capable of predicting long-term prognosis after CRT, a database of 1,510 patients undergoing CRT implantation was used.34 A total of 33 pre-implantation clinical variables was collected to train an RF model, and 10-fold cross-validation was performed. The model was then tested on an independent cohort of 158 patients and did well in predicting 1-, 2-, 3-, 4-, and 5-year mortality, with an AUC >0.75.34 To predict the short- and long-term prognosis after PCI, Zack et al built RF prediction models by analyzing 11,709 patients with 14,349 PCIs.35 In all, 52 clinical parameters at admission were used to predict in-hospital mortality, whereas 358 variables at discharge were used to predict 30-day HF readmission and all-cause death. Eight-fold cross-validation was used in the RF models, and all 3 models achieved an AUC >0.85.35
The studies mentioned above show that ML models may be superior to standard linear regression models29,32,33,35 and currently used clinical risk scoring systems28,30,32,34 in performing prediction tasks. This is mainly because ML is not only able to incorporate a larger number of variables, but it can also analyze the possibly complex interactions and nonlinear effects of the variables.36
WorkflowThe workflow of an ML-based prediction study usually involves the collection of raw data, feature selection, dataset splitting, and model building. There are mainly 2 types of raw data: variables available at baseline, and whether a subject experience targeted CVDs or events based on follow-up records (Figure 2A). Feature selection is used to select more informative and non-redundant variables from the available variables, and the selected variables are used to construct ML models (Figure 2B). The whole dataset is typically divided into a training set and a testing set. The former is used to develop the ML model, whereas the latter is used to assess its generalizability. The training set is usually randomly divided into several equal-sized groups; one of the groups is used as a validation set, whereas the other groups are used as training sets at each iteration, a process called “cross-validation”, a useful way to avoid overfitting. Briefly, overfitting indicates that models perform well on the training set but poorly on unseen datasets. Details regarding the reasons for overfitting and how to avoid it have been reviewed elsewhere.37 Sometimes an external dataset is used to further test a model’s generalizability (Figure 2C).
Workflow for the construction of a machine learning (ML)-based prediction model. (A) Raw data is collected at baseline and during follow-up. (B) More informative and non-redundant variables are selected from the available baseline variables to construct ML models. (C) The internal dataset is divided into a training set and a testing set (e.g., 4 : 1, as in the figure), which are used to develop the ML model and assess its generalizability, respectively. Furthermore, cross-validation is usually used to enhance the model’s performance. The example of 4-fold cross-validation is shown in the figure, in which the training set is divided into 4 equal-sized groups, with one of the groups used as a validation set and the other 3 groups used as training sets at each iteration. In some studies, an external dataset is used to independently assess the model’s performance. (D) Classifier algorithms are used to build ML models and then assess their performance. AUC, area under the curve; ECG, electrocardiogram.
The prediction of incidence or prognosis is actually a classification task, so classifier algorithms, such as RF, NN, and gradient boosting, are usually required in the development of prediction ML models. Once the prediction model is built, indices like the AUC, sensitivity, and specificity are calculated to quantify the performance of the ML model (Figure 2D).
Most CVDs are heterogeneous,38 which means patients with the same CVD may have distinct etiologies, clinical characteristics, auxiliary examination results, outcomes, and therapeutic responses. Therefore, there is an urgent need to integrate data from different sources to make the classification of a disease more accurate. Such accurate classification could guide risk stratification, prognostic prediction, and even the choice of treatment, so it is discussed in this section.
HF With Preserved Ejection Fraction (HFpEF)HFpEF is an acknowledged phenotypically heterogeneous disease with a high prevalence and no proven useful medical therapies.39 Accurate classification of HFpEF may be a critical step for the design of a clinical trial and the development of useful therapies for specific HFpEF subgroups. To this end, several research groups have used AI to identify phenotypically distinct HFpEF categories. For example, Shah et al prospectively collected 67 phenotypic variables from 397 HFpEF patients, generated a correlation matrix of phenotypic variables, and filtered out variables that were correlated at a correlation coefficient of >0.6, leaving 46 continuous variables for the final clustering analyses.40 Three clusters were determined using the 46 identified variables. Surprisingly, the 3 subgroups differed significantly not only in clinical characteristics, but also survival. These results were validated in another prospective cohort of 107 HFpEF patients.40 Using different HFpEF cohorts but similar study strategies, Segar et al also identified 3 mutually exclusive subgroups of HFpEF patients with distinct clinical characteristics and long-term outcomes.41 Hedman et al used 32 echocardiographic and 11 clinical and laboratory variables to perform ML-based clustering and identified 6 phenotype-based groups.42 Importantly, the results of that study revealed differential characteristics and outcomes, as well as different levels of inflammatory and cardiovascular plasma proteins across the newly identified subgroups.42 In another study, instead of inputting several different types of medical data, Przewlocka-Kosmala et al used only resting and postexercise echocardiographic parameters and divided HFpEF patients into 2 subgroups.43 One of the subgroups was characterized by a relatively isolated impairment of left ventricular systolic reserve and a better prognosis, whereas the other showed abnormal longitudinal deformation, ventricular-arterial coupling, and cardiac output responses to exercise.43 All the studies described above proved the feasibility of ML-based clustering analysis to define HFpEF subgroups with different clinical characteristics and prognoses, but further studies are required to determine whether these subgroups respond differently to specific therapies and whether there are optimal therapeutic targets for each of the subgroups (Table 3).
Disease | Sample size | Parameters used in unsupervised ML methods |
No. subgroups |
Differences among subgroups |
Reference |
---|---|---|---|---|---|
HFpEF | Discovery cohort: 397; validation cohort: 107 |
46 clinical, laboratory, ECG, and echocardiographic parameters |
3 | Clinical characteristics, cardiac structure/function, invasive hemodynamics, and outcomes |
40 |
HFpEF | Discovery cohort: 654; internal validation cohort: 1,113; external validation cohort 216 |
61 clinical, laboratory, ECG, and echocardiographic parameters |
3 | Clinical characteristics and long- term outcomes |
41 |
HFpEF | 320 | 32 echocardiographic and 11 clinical/laboratory parameters |
6 | Clinical characteristics and outcomes, as well as concentrations of inflammatory and cardiovascular plasma proteins |
42 |
HFpEF | 177 | 8 resting and post-exercise echocardiographic parameters |
2 | Left ventricular systolic reserve and prognosis |
43 |
PAH | Discovery cohort: 281; validation cohort: 104 |
Circulating proteomic panel of 48 cytokines, chemokines, and factors |
4 | Blood proteomic immune profiles, clinical risk, and long-term outcomes |
44 |
PMR | 122 | 64 clinical and echocardiographic variables |
3 | Clinical characteristics, prognosis, and therapeutic response to surgery (mitral valve repair or replacement) |
45 |
AC | Discovery cohort: 60; validation cohort: 92 |
18 parameters derived from pathological images of explanted AC hearts |
4 | Genetic background, echocardiographic and ECG parameters |
46 |
HF | 1,106 | 50 clinical, laboratory, ECG, and echocardiographic parameters |
4 | Clinical characteristics, biomarker values, ventricular structure/function, and therapeutic response to CRT |
47 |
AC, arrhythmogenic cardiomyopathy; ECG, electrocardiography; PAH, pulmonary arterial hypertension; PMR, primary mitral regurgitation. Other abbreviations as in Tables 1,2.
Unsupervised clustering analysis has also been applied in other CVDs besides HFpEF. Sweatt et al used unsupervised ML to classify pulmonary arterial hypertension (PAH) patients into 4 clusters based on blood proteomic profiles that included 48 inflammation- or autoimmunity-related molecules.44 These 4 PAH clusters were distinct in terms of proteomic immune profiles, clinical risk, and long-term outcomes. That study was valuable because it identified possible immunotherapy targets for PAH.44
Primary mitral regurgitation (PMR) is another heterogeneous clinical disease, with considerable differences in prognosis among patients after valve surgery. To identify phenotypically distinct categories of PMR patients, Pimor et al performed unsupervised clustering analysis using 64 clinical and echocardiographic variables of PMR patients before valve surgery.45 These patients were then classified into 3 phenotypes that differed markedly in terms of clinical characteristics and post-surgery prognosis. The ML model could be used to guide cardiac surgeons to identify the high-risk subgroup, and these patients could be carefully monitored and may even be treated earlier.45
Arrhythmogenic cardiomyopathy (AC) is an inherited cardiomyopathy that is heterogeneous in the overall distribution of fibrofatty infiltration in the heart. We have previously used unsupervised clustering to classify AC patients into 4 subgroups based on 18 parameters derived from pathological images of 60 explanted AC hearts, and these 4 subgroups had distinct genetic backgrounds, echocardiographic variables, and ECG parameters.46 That study established a novel pathological classification with distinct genotypes indicating different potential mechanisms in the pathogenesis of AC.46
HF is a heterogeneous clinical syndrome with a substantial proportion of patients who do not respond to CRT. To identify patients who are likely to respond to CRT, Cikes et al used unsupervised ML to categorize 1,106 HF patients who were randomized to either receive CRT or not.47 Fifty baseline clinical and echocardiographic variables were used in the ML method, and 4 phenogroups were identified. Surprisingly, 2 of these phenogroups were found to be likely to benefit from CRT by comparing the HF-free survival rate after treatment in each of the phenogroups.47 This finding may guide cardiologists to identify patients who are most likely to respond to CRT (Table 3).
WorkflowMost CVDs are heterogeneous (Figure 3A). To classify the heterogeneous population into several homogenous subgroups, information is collected for the available variables, such as clinical characteristics, cardiac imaging, ECGs, laboratory tests, and even pathological images. Then, dimensionality reduction, including feature selection and feature projection, is performed. Feature selection involves using algorithms to select more valuable features for classification, and is critical to improve the performance of algorithms by reducing redundant features (Figure 3B). Feature projection involves projecting the selected features into a 2-dimensional space, which helps visualization (Figure 3C). After dimensionality reduction, unsupervised ML is used to define homogeneous subgroups (Figure 3D,E). Finally, a comparison among different subgroups is performed (Figure 3F). Unsupervised learning was used to achieve clustering analysis in the most of the relevant studies, and the determination of different clusters is based on the similarity of patients’ input data.
Workflow to conduct a classification study of cardiovascular diseases (CVDs) using machine learning (ML). (A) Most CVDs are heterogeneous. (B,C) Dimensionality reduction consists of 2 important processes, namely feature selection (B) and feature projection (C). (D) Unsupervised ML. (E) Homogeneous subgroups. (F) Comparisons among different subgroups. ECG, electrocardiogram.
To help clinicians better understand AI and conduct related studies, we have described some basic knowledge about AI, ML, and algorithms, and then summarized reported studies associated with AI-based diagnosis, prediction, and classification in CVDs (Tables 1–3), after which the general workflow of each of the 3 applications was illustrated (Figures 1–3).
There are still some obstacles in using ML-based AI in cardiovascular practice. First, data availability limits the generalizability of ML algorithms. The data used for the training of ML models are typically acquired from 1 or several laboratories, health centers, or hospitals, and the algorithms are therefore likely to fail when applied to different populations.6 Second, obtaining large quantities of high-quality labeled data, which are essential for the training of supervised learning algorithms, is labor intensive and often performed manually.48 Third, the “black box” property of DL, which means the inner mechanisms and processes of DL models, cannot be explained and is not accepted by many clinicians.49
The authors’ work reported herein was supported by grants from the Chinese Academy of Medical Sciences (No. 2016-I2M-1-015 and 2019-12M-1-002), National Natural Science Foundation of China (No. 81670376), and the Peking Union Medical College (No. 3332018140).
None declared.