Endotyping in Heart Failure　― Identifying Mechanistically Meaningful Subtypes of Disease ―

Lusha W. Liang; Yuichi J. Shimada

doi:10.1253/circj.CJ-21-0349

Abstract

Endotyping is an emerging concept in which diseases are classified into distinct subtypes based on underlying molecular mechanisms. Heart failure (HF) is a complex clinical syndrome that encompasses multiple endotypes with differential risks of adverse events, and varying responses to treatment. Identifying these distinct endotypes requires molecular-level investigation involving multi-“omics” approaches, including genomics, transcriptomics, proteomics, and metabolomics. The derivation of these HF endotypes has important implications in promoting individualized treatment and facilitating more targeted selection of patients for clinical trials, as well as in potentially revealing new pathways of disease that may serve as therapeutic targets. One challenge in the integrated analysis of high-throughput omics and detailed clinical data is that it requires the ability to handle “big data”, a task for which machine learning is well suited. In particular, unsupervised machine learning has the ability to uncover novel endotypes of disease in an unbiased approach. In this review, we will discuss recent efforts to identify HF endotypes and cover approaches involving proteomics, transcriptomics, and genomics, with a focus on machine-learning methods.

Heart failure (HF) affects over 60 million people worldwide and is a leading cause of morbidity and mortality.¹ The diagnosis of HF is based on an assessment of clinical features and has historically been divided by left ventricular (LV) ejection fraction (EF) into 2 categories: HF with reduced EF (HFrEF) and HF with preserved EF (HFpEF).² However, this simplistic approach fails to capture the underlying heterogeneity of the etiology and pathophysiology of HF, and more nuanced approaches to subtyping HF are needed.³ The limitation of this approach is perhaps best exemplified by the lack of effective therapies for patients with HFpEF.³ Conversely, HFrEF now exists in an era of multiple therapies, and successful strategies for guiding individualized treatment in HFrEF will also need to rely on a deeper understanding of the pathophysiology of HF subtypes beyond classification by EF alone.⁴ Finally, there has been growing interest in the entity of HF with mid-range EF (HFmrEF), which also encompasses a wide array of patients, from those with HF with recovered EF to those who progress to HFpEF or HFrEF.²^,⁵

The most common forms of HF result from a complex interplay between multiple genetic and environmental factors that lead to myocardial dysfunction, and thus prognostication based on clinical data alone can be challenging.⁶ Numerous prognostic markers of death and/or hospitalization have been identified in patients with HF; however, their clinical applicability has remained limited.² This highlights the need for new methodologies in the classification and characterization of HF.

Such strategies include the use of large-scale studies of molecular-level abnormalities in HF encompassing a multi-“omics” approach (e.g., genomics, proteomics, transcriptomics, and metabolomics). One challenge in the analysis of high-throughput “omics” data is that it requires the ability to handle big data, a task for which machine learning (ML) is well suited.⁷ In this review, we will discuss recent efforts to identify subtypes of disease in HF and cover approaches involving proteomics, transcriptomics, and genomics, with a focus on ML methods.

Supervised ML and Unsupervised Clustering

ML is a branch of artificial intelligence (AI) consisting of pattern-recognition algorithms used to define relationships between objects.⁸ ML is typically subdivided into supervised and unsupervised learning (Figure 1A).⁸ Supervised ML focuses on making predictions after training on a selected dataset.⁸ Unsupervised ML was designed to determine the intrinsic structure of a dataset.⁸ Although supervised ML is useful, both in its own right and as a method for validating the results of unsupervised ML, the use of supervised ML models relies on pre-existing classifications of HF and therefore is not suited to the discovery of novel subtypes of disease.

Figure 1.

Commonly used machine learning (ML) methods in medicine. (A) Supervised vs. unsupervised ML. Supervised ML is used for classification and regression problems whereas unsupervised ML is used to determine the intrinsic structure of a dataset and uncover distinct clusters of data. Commonly used supervised ML algorithms include decision tree-based methods such as random forest algorithms, neural networks, and support vector machines. Clustering techniques include k-means clustering and partitioning around medoids, hierarchical clustering, and model-based clustering. (B) Datasets that are well suited vs. not well suited for analysis using k-means clustering analysis. (Left) A dataset that forms circular clusters is well suited for analysis using k-means clustering. (Right) A dataset that forms noncircular clusters may not be well suited for analysis using k-means clustering. (C) Dendrogram formed by use of a hierarchical clustering algorithm. (D) Use of gap statistics to determine optimal number of clusters in unsupervised ML. Dividing this example dataset into 4 clusters maximizes the gap statistic.

Cluster analysis is an important component of unsupervised ML whose primary goal is to find related groups in a dataset without the use of a response variable.⁹ Multiple clustering algorithms exist, each with its own set of benefits and drawbacks. One of the simplest clustering algorithms is k-means clustering wherein data points are organized around centroids, and the objective is to identify the configuration that minimizes the distance, or dissimilarity, of the data points to their closest selected centroid.⁹ The k-means clustering functions best when the dataset forms circular clusters (Figure 1B).⁹ Hierarchical agglomerative clustering is another commonly used algorithm, in which each data point begins as a singleton cluster and pairs of clusters are then successively merged until all data points are eventually contained in 1 large cluster.⁹ The result is a tree-based representation of the data points, called a dendrogram (Figure 1C).⁹ However, there is no systematic guidance as to where to cut the dendrogram to form clusters.⁹

Indeed, a major challenge in cluster analysis is the estimation of the optimal number of clusters.¹⁰ Gap statistics is a commonly used method to estimate the number of clusters in a dataset by comparing the change in within-cluster dispersion with that expected under the null distribution (Figure 1D).¹⁰ In addition, a model-based clustering algorithm can be used in which each cluster is mathematically represented by a parametric distribution (e.g., Gaussian) so that the entire dataset is modeled by a mixture of these distributions; cluster assignment is then made based on maximization of a penalized likelihood.¹¹ Finally, topological data analysis (TDA) refers to a collection of innovative statistical methods that provide insight into the geometric structure of data and generate useful visualizations of complex datasets. TDA can be used in an unsupervised ML pipeline to cluster similar patients into nodes.¹²

Approach to Heterogeneity Using Clinical Data: “Phenogrouping”

In the first study to use unsupervised ML to classify “phenogroups” in HFpEF, Shah et al relied on deep clinical phenotyping and used hierarchical clustering techniques to identify 3 clusters of patients: a younger group with lower B-type natriuretic peptide (BNP) levels, an older group with chronic kidney disease (CKD), and a group characterized by multiple comorbidities, including obesity and diabetes.³ In a validation cohort, phenogroup membership was independently associated with adverse outcomes, with patients in the phenogroup characterized by CKD having the highest risk of adverse outcomes. A number of subsequent studies have further validated this approach in patients with HFpEF.¹³^,¹⁴ In these studies, the phenogroup of patients characterized by a heavier burden of comorbidities, including CKD, was again found to have the highest risk of adverse outcomes.¹³^,¹⁴

Focusing instead on HFrEF, Ahmad et al¹⁵ used unsupervised ML clustering techniques to identify 4 distinct clusters of patients in a group of 1,619 participants. Although LVEF was not significantly different across all 4 patient clusters, there were significant differences between the clusters in clinical characteristics, outcomes, and responses to therapy. In another study examining patients with HFrEF undergoing cardiac resynchronization therapy (CRT), unsupervised ML detected 2 phenogroups of patients who were found to respond substantially better to CRT therapy than patients in other phenogroups.¹⁶ Applying TDA to echocardiographic features of LV structure and function, Tokodi et al generated a network with 4 distinct regions, with patients in each region demonstrating significant differences in major adverse cardiac event (MACE)-related rehospitalization and death.¹⁷ Collectively, these studies using clinically available data demonstrate the phenotypic diversity of HF and serve as a proof-of-concept that unsupervised ML can provide clinically meaningful classifications of HF that can aid in optimizing management and treatment strategies. However, derivation of HF subtypes using only clinical data does not reflect the underlying molecular pathobiology of the disease and has limited ability to provide insights into novel molecular pathways of disease that could further inform our understanding of HF.

The Emerging Concept of “Endotyping”

HF is a complex syndrome, and diagnosis is based on a comprehensive clinical assessment.² Severity of symptoms can be further classified using the New York Heart Association functional classes and the American College of Cardiology/American Heart Association staging system.² However, these classifications reflect downstream consequences of myocardial dysfunction rather than underlying molecular and cellular mechanisms of disease.

“Endotyping” is an emerging concept in which diseases are classified into distinct subtypes based on underlying molecular mechanisms (Figure 2).¹⁸ Endotypes can identify groups of patients who may respond differently to treatment based on shared dysfunctional molecular pathways.¹⁸ This is particularly important in HF because it encompasses multiple endotypes that may have differential risks of adverse events and varying responses to treatment. The previous studies using ML to identify phenogroups have demonstrated this concept using clinical data. However, identifying endotypes takes this concept one step further into molecular-level investigations.

Figure 2.

Association between the clinical syndrome of heart failure (HF), clinically defined phenotypes, and molecularly defined endotypes. HF is a complex syndrome diagnosed based on comprehensive clinical assessments. It has historically been divided by ejection fraction (EF) into the following phenotypes: HF with preserved EF (HFpEF), HF with mid-range EF (HFmrEF), and HF with reduced EF (HFrEF). Endotypes are mechanistically distinct subtypes of disease based on molecular-level investigation. HF likely encompasses multiple endotypes that may have differential risks of adverse events and varying responses to treatment.

Studies using omics data and unsupervised clustering approaches can provide unbiased molecular evidence for derivation of endotypes.⁶^,¹⁸ These studies also have the potential to reveal novel insights into disease pathophysiology as well as facilitating the development of individualized treatment.⁶^,¹⁸ The HF Collaboratory recently released a statement addressing the need for more targeted approaches to therapy for patients with HFpEF, and recommended that “successful strategies for how to guide medical therapy for patients will have to rely on the emergence of health data science and a deeper understanding of fundamental biology, pathophysiology, genomics, and phenotyping.”⁴ In the next few sections, we will discuss recent efforts to identify mechanistically meaningful endotypes of HF and identify molecular markers of different disease states using proteomics, transcriptomics, and genomics.

Endotyping Using Proteomics Profiling

Proteomics profiling involves the measurement of hundreds to thousands of proteins simultaneously. Many proteins that are involved in signaling pathways in the heart can be detected in peripheral plasma through the use of proteomics profiling.¹⁹ Applying unsupervised ML to proteomics profiling data can lead to the identification of clusters of patients with unique proteomics profiling signatures, thus illuminating underlying mechanisms of disease and allowing for the derivation of molecularly distinct endotypes (Table).

Table. Summary of Studies Using Proteomics and Machine Learning-Based Approaches for Endotype Derivation

Author (year)	HF phenotype	Population (sample size)	Machine learning method used	Endotypes identified	Dysregulated pathways/protein biomarkers
Woolley et al (2021)²⁰	HFpEF	BIOSTAT-CHF (n=429)	Hierarchical clustering	Endotype 1: Younger, lower NT-proBNP Endotype 2: Older, CKD* Endotype 3: Multiple comorbidities Endotype 4: ICM	Endotype 2: upregulation of inflammatory pathways Endotype 4: upregulation of cell proliferation regulation and cell survival pathways
Stienen et al (2020)²¹	HFpEF	MEDIA-DHF (n=392)	k-means clustering	Endotype 1: Younger, fewer comorbidities Endotype 2: Multiple comorbidities including CKD*	Endotype 2: upregulation of pathways involved in immune system activation, signal transduction cascades, and cell interactions and metabolism
Tromp et al (2018)²²	HFrEF	BIOSTAT-CHF (n=1,802)	Principal component analysis and partitioning around medoids	Endotype 1: Younger, lower NT-proBNP Endotype 2: Elderly, poor response to β-blocker uptitration, kidney disease Endotype 3: ICM Endotype 4: Highest NT-pro BNP, AF* Endotype 5: High rates of anemia Endotype 6: Hypertensive	Endotype 4: Very high levels of IGFBP1 and NT-pro-BNP Endotype 5: Very low levels of CHIT1 (increased levels associated with arteriosclerosis and Gaucher’s disease)
Verdonschot et al (2021)⁵¹	DCM	Maastricht Cardiomyopathy Registry (n=795)	Hierarchical clustering of principal components	Endotype 1: Younger patients with mild systolic dysfunction Endotype 2: Young females with auto-immune disease Endotype 3: Males with AF, NSVT, genotype positive* Endotype 4: Severe systolic dysfunction and diastolic dysfunction	Endotype 2: Pro-inflammatory pathways Endotype 4: Increased glycolytic substrate usage, increased purine and pyrimidine metabolism reflecting increased DNA replication pathways

*Endotype associated with worse outcomes. AF, atrial fibrillation; BISOTAT-CHF, A Systems Biology Study to Tailored Treatment in Chronic Heart Failure; CKD, chronic kidney disease; DCM, dilated cardiomyopathy; HF, heart failure; HFmrEF, heart failure with mid-range ejection fraction; HFpEF, heart failure with preserved ejection fraction; HFrEF, heart failure with reduced ejection fraction; ICM, ischemic cardiomyopathy; IMMACULATE, Improving Remodeling in Acute Myocardial Infarction Using Live and Asynchronous Telemedicine; LVDD, left ventricular diastolic dysfunction; MEDIA-DHF, The Metabolic Road to Diastolic Heart Failure: Diastolic Heart Failure study; MI, myocardial infarction; NSVT, non-sustained ventricular tachycardia; NT-proBNP, N-terminal-proB-type natriuretic peptide.

Woolley et al examined a panel of 363 proteomic biomarkers from 429 patients with HFpEF and used unsupervised ML to identify 4 distinct endotypes with the following clinical characteristics: a younger group with lower N-terminal-proB-type natriuretic peptide (NT-proBNP) levels, an older group with CKD, a group with multiple comorbidities, and a group with significant coronary artery disease (CAD).²⁰ Interestingly, the clinical characteristics of the endotypes were very similar in this study using proteomics data compared with the Shah study using clinical phenotyping.³ The group with the highest prevalence of CKD was associated with worse outcomes in both studies. Pathway analysis revealed upregulation of inflammatory pathways in the endotype characterized by CKD, as well as upregulation of pathways implicated in cell proliferation regulation and cell survival in the endotype characterized by ischemia.

Stienen et al²¹ performed a similar study, applying unsupervised ML techniques to 392 patients with HFpEF using a panel of 415 proteomic biomarkers. Their analysis identified 2 distinct endotypes, with patients in 1 endotype experiencing higher rates of cardiovascular death and hospitalization. Pathway analysis revealed upregulation of pathways involved in immune system activation, signal transduction cascades, cell interactions, and metabolism in the endotype with worse outcomes. Taken together, both studies demonstrate the heterogeneity of HFpEF and highlight potential future targets for investigation and development of mechanistically directed therapies. In addition, the proteomic biomarkers and pathways identified in both studies may help guide selection of patients for clinical trials in HFpEF who are more likely to benefit from a particular therapy based on their underlying pathophysiology.

Endotyping via ML-based analysis of proteomic data has been applied to HFrEF as well. Tromp et al²² applied unsupervised ML to a panel of 92 proteomic biomarkers in patients with HFrEF and identified 6 distinct endotypes with marked differences in clinical characteristics, outcomes, and response to medical therapy. Notably, 1 particular endotype did not derive benefit from β-blocker treatment despite being indistinguishable from the other patients with HFrEF based on clinical characteristics alone, demonstrating the added value of biomarker analysis and endotyping. A limited number of proteomic biomarkers could adequately discriminate patient endotype membership in this study, suggesting that such a panel of biomarkers could be used with relative ease to determine endotypes in a clinical setting.

One major challenge in the interpretation of proteomics profiling is the difficulty in assessing whether the differentially regulated pathways between endotypes are causal or secondary to disease progression. Integration with data from genomics and transcriptomics could further clarify the significance of pathways identified in proteomics.

Endotyping Using Transcriptomics

Transcriptomics involves the study of ribonucleic acid (RNA) transcripts that are produced by the genome using high-throughput methods such as microarray analysis.²³ Comparison of transcriptomes allows the identification of genes that are differentially expressed in different cell populations, disease states or in response to different treatments.²³ Transcriptomics has the potential to refine diagnostic and prognostic accuracy in a number of diseases, and has found considerable success in oncology.²⁴ These techniques have not been developed as robustly in HF, although several studies have shown promise in correlating RNA transcript levels with different HF disease subtypes.

Kittleson et al²⁵ used gene expression microarrays of myocardial samples obtained from patients with endstage HF at the time of transplantation or LV assist device implantation to develop a 90-gene panel on a training dataset; this prediction panel was then applied to a separate group of patient samples (test set). Using supervised principal components clustering techniques, the prediction panel was able to distinguish between patients with ischemic cardiomyopathy (ICM) and non-ischemic cardiomyopathy (NICM) with 100% sensitivity and specificity. Interestingly, when applied to patients with newly diagnosed cardiomyopathy, the prediction panel performed perfectly in NICM but only identified 1 of 3 ICM samples correctly. This suggests that patients with ICM experience greater changes in gene expression as the disease progresses when compared with patients with NICM, and emphasizes the need for stage-specific prediction profiles.

RNA-sequencing (RNA-seq) is a newer approach for transcriptome profiling and allows for an unbiased survey of the entire transcriptome.²³ RNA-Seq has a greater dynamic range than microarrays, which can be susceptible to nonspecific hybridization and saturation biases.²³ One study evaluated RNA-seq data from myocardial samples of 6 patients, comprising 1 patient with ICM, 2 patients with dilated cardiomyopathy (DCM), and 3 patients with non-failing (NF) hearts.²⁶ Genes that were globally differentially expressed were then used as feature vectors to classify 313 individuals with microarray data using a k-means clustering algorithm. Remarkably, based on the feature vectors of only 6 patients, Liu et al demonstrated high accuracy in classifying patients between ICM and NF, as well as between DCM and NF. This study identified genes with distinct expression patterns between failing and NF hearts and found that these detailed expression profiles had the ability to distinguish different disease states.

Beyond gene expression profiles of messenger RNA (mRNA), many noncoding RNA molecules (e.g., microRNAs (miRNAs), long noncoding RNAs (lncRNAs), and circular RNAs) have been found to play an important role in the regulation of gene transcription, epigenetics, and post-transcriptional mRNA processing and can be detected noninvasively in peripheral plasma (Figure 3).²⁷ Circulating concentrations of noncoding RNA molecules vary in response to an array of acute and chronic disease states.²⁷^,²⁸ miRNAs in particular are attractive candidates for biomarkers, given their stability in stored samples.²⁷ Several studies have found that miRNA signatures can differentiate between patients with HFpEF vs. HFrEF, as well as between patients with dyspnea from HF vs. those with dyspnea from chronic obstructive pulmonary disease.²⁹^,³⁰ In combination with NT-proBNP levels, the addition of miRNA panels further improves the accuracy of classification of patients with and without HF.³¹ In a study of 2,203 patients with chronic HF, higher levels of miR-1254 and miR-1306-5p were associated with higher risk of all-cause death and HF hospitalization, although hazard ratios were modest.³² Importantly, most of these studies utilized small panels of miRNA (4–12) to achieve high discriminative accuracy, suggesting that they could be attractive candidates for biomarkers in the clinical setting.

Figure 3.

Subtypes of RNA. Gene expression profiles often measure mRNA transcripts, but analysis of noncoding RNA molecules such as microRNA and long noncoding RNAs can also elucidate novel mechanisms of heart failure and serve as biomarkers of disease.

The lncRNAs, which are often defined as noncoding protein transcripts larger than 200 nucleotides, are also potential markers of cardiac dysfunction. They have been found to be independent predictors of diastolic function and remodeling in patients with diabetic cardiomyopathy,²⁷ and have also been found to have value in predicting response to therapy in a trial of pioglitazone in patients with HF and diabetes.³³ This has important implications in the use of transcriptomics as a means to guide treatment. Future efforts in transcriptomics will need to focus on improving standardization of microarray platforms and RNA-seq methods, as well as statistical data handling, to ensure that results are valid and generalizable.

Endotyping Using Genomics

Familial monogenetic cardiomyopathies are generally organized into several major phenotypic categories: hypertrophic, dilated, arrhythmogenic, restrictive, and LV noncompaction cardiomyopathy.⁶^,³⁴ However, even within these categories there is significant heterogeneity in the underlying etiology and clinical manifestations of disease.⁶^,³⁴^,³⁵ Moreover, many patients with cardiomyopathies have negative genetic testing, and even in those with known pathogenic variants, penetrance is often incomplete.⁶^,³⁴^,³⁵

In general, genetic testing for patients with cardiomyopathies is recommended when there is a family history of cardiomyopathy and cascade screening of at-risk family members is feasible and desired.⁶^,³⁶^–³⁸ In the majority of circumstances, identification of a pathogenic variant does not alter treatment or risk stratification, because there is a lack of robust genotype–phenotype associations.³⁶^–³⁸ A notable exception is patients with arrhythmogenic cardiomyopathy and a mutation in LMNA, FLNC or PLN.³⁷ Mutations in these genes are associated with a higher risk of life-threatening arrhythmias, so for these patients there is a Class IIa recommendation for primary prevention with an implantable cardioverter defibrillator (ICD).³⁷ In addition, patients with DCM and LMNA or SCN5A mutations may similarly be considered for primary prevention ICD.³⁸

Apart from the familial monogenetic cardiomyopathies, epidemiologic studies have shown that genetic predisposition does still play a role in all-cause HF. A study evaluating the HF status of adoptees compared with their adoptive and biologic parents estimated the heritability of HF to be 26%.³⁹ Genome-wide association studies (GWAS) have aimed to find associations between common single-nucleotide polymorphisms (SNPs) and HF using SNP arrays. In a GWAS of 47,309 patients with all-cause HF and 930,014 controls, Shah et al identified 12 independent variants at 11 genomic loci that were associated with HF.⁴⁰ Despite the size of the study, only a modest number of genetic associations were identified and cumulative heritability was estimated at 9%, suggesting that an important component of HF heritability may be more attributable to specific disease subtypes than components of a final common pathway.

Indeed, in a prior study, GWAS was initially performed on 7,382 all-cause HF patients but only found genetic loci that were linked to upstream HF risk factors such as CAD and atrial fibrillation.⁴¹ Once the HF phenotype was refined to only include the 2,138 patients with NICM, multiple significant genetic loci were found to be significantly associated with NICM and were independent of upstream factors. Other GWAS have been successfully performed on parameters related to HF, including biomarker levels.⁴²^,⁴³ These studies suggest that focusing on novel endotypes of HF could increase the power of GWAS to detect significant genetic associations that are also mechanistically meaningful.

Common variants may only account for a fraction of genomic variation, and HF may involve combined actions of multiple rare variants that require whole-exome or whole-genome sequencing (WES; WGS).⁶ A recent WES study of 5,942 patients with HF found that rare variants may also play a role in all-cause HF.⁴⁴ In a cohort of older patients with predominantly ICM, the diagnostic yield of pathogenic variants in 41 known cardiomyopathy genes was 3.6%. Notably, the study used highly stringent criteria for qualifying diagnostic variants; thus, the true diagnostic yield may be higher. Interestingly, there was a similar diagnostic yield in patients with HFpEF, HFmrEF, and HFrEF, with substantial overlap in the genes implicated. This finding suggests the presence of common genetic pathways irrespective of EF.

Interpreting new variants by WES and WGS can pose a major challenge and may require the use of multi-omics approaches. The TTN gene, which encodes the giant sarcomere protein titin, presents a unique challenge because it undergoes extensive alternative splicing to produce multiple isoforms.⁴⁵ TTN mutations can cause DCM, and heterozygous mutations that truncate full-length titin (titin-truncating variants: TTNtv) are the most common genetic cause of familial DCM. In addition, TTNtv also occurs in about 2% of individuals without overt cardiomyopathy. In a study combining genetic, transcriptomic, proteomic, and clinical data, Roberts et al found that the clinical significance of TTNtv is largely determined by exon usage and variant location. In particular, TTNtvs in exons with proportion-spliced-in (PSI) greater than 0.9 are much more likely to be pathogenic.⁴⁵ Patients with DCM and high-PSI TTNtv were found to have earlier onset of HF, arrhythmias, and death than other patients with DCM. Thus, patients with TTNtv DCM may represent a higher-risk subtype of DCM who may benefit from a lower threshold for ICD therapy.

With the expansion of biobanks linking large-scale genomic sequence data to electronic health records (EHR), genomics-first studies have emerged. These studies first identify patients with a variant of interest, then use EHR data to associate clinical features and outcomes back to the variant. In a genome-first study of patients with TTNtvs, individuals of European descent with DCM and TTNtvs had increased LV size, decreased LV function, and increased arrhythmia burden compared with patients with DCM without TTNtvs, similar to the findings from the previous study.⁴⁵^,⁴⁶ TTNtvs were also associated with reduced cardiac function, even in the absence of a DCM diagnosis, which suggests that TTNtv carriers without DCM may harbor unrecognized cardiac dysfunction. By contrast, these associations were not found to be significant in individuals of African ancestry. Ancestral differences in genotype-phenotype relationships add another level of complexity to genomics studies, and there is often an underrepresentation of participants of non-European descent in large genetics studies.

Conclusions and Future Perspectives

With advances in high-throughput omics technologies, our ability to acquire a multitude of molecular-level data in HF continues to expand.⁷ AI techniques are well suited for the integrated analysis of omics and clinical data, and unsupervised ML in particular can uncover novel endotypes of HF.⁸ These derived endotypes may be associated with differential outcomes and responses to treatment, with important implications for risk stratification, choice of pharmacotherapy, and clinical trial selection. Through the process of endotyping we may identify novel pathways that are associated with these differential clinical outcomes and treatment responses, thus elucidating new mediators of disease. Pathway analysis may also allow for specification of molecular targets for development of pharmacologic interventions, which will then need to be mechanistically evaluated in experimental model systems (e.g., animal, induced pluripotent stem cells).⁴⁷^,⁴⁸ Proteomics has seen the greatest application of ML for these purposes, but there is potential to apply similar algorithms in other omics fields as well (Figure 4).

Figure 4.

Proposed workflow for omics-based endotype derivation. (A) Main steps in endotype derivation. The workflow begins by performing high-throughput omics profiling on large cohorts of patients with heart failure (HF). Unsupervised machine learning (ML) methods can then be applied to derive endotypes based on integrated analysis of omics and clinical data. These derived endotypes may be associated with differential outcomes and responses to treatment. Through the process of endotyping we may also identify novel pathways of disease. (B) Endotypes with differential risks of event-free survival. (C) Pathway analysis identifying dysregulated pathways associated with the highest-risk endotypes.

One major limitation in these types of studies is that they largely rely on retrospective data. In addition, there is a risk of false discovery in AI-based studies, which is particularly inflated in studies with high-dimensional data and relatively small sample sizes.¹² Successful translation of omics studies to clinical practice will require ongoing studies in large cohorts and validation across diverse populations via global, interdisciplinary collaboration. Pathway and network analysis of multi-omics-level data can also lower the risk of false-positive discovery and confer biological plausibility and interpretability to study findings.⁴⁹ Moreover, ML algorithms must be integrated with causal reasoning and clinical knowledge.⁵⁰ These integrative approaches will ultimately help to unravel the heterogeneity of this complex syndrome and enable clinicians to individualize care for their patients with HF.

Disclosures / Acknowledgements

None.

Funding

Y.J.S. is supported in part by unrestricted grants from the American Heart Association National Clinical and Population Research Awards, the American Heart Association Career Development Award, Korea Institute of Oriental Medicine, Honjo International Scholarship Foundation, and NIH R01 HL157216.

References

1. Groenewegen A, Rutten FH, Mosterd A, Hoes AW. Epidemiology of heart failure. Eur J Heart Fail 2020; 22: 1342–1356.
2. Ponikowski P, Voors AA, Anker SD, Bueno H, Cleland JGF, Coats AJS, et al. 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J 2016; 37: 2129–2200.
3. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, et al. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation 2015; 131: 269–279.
4. Bhatt AS, Abraham WT, Lindenfeld JA, Bristow M, Carson PE, Felker GM, et al. Treatment of HF in an era of multiple therapies: Statement from the HF Collaboratory. JACC Hear Fail 2021; 9: 1–12.
5. Tsutsui H, Isobe M, Ito H, Okumura K, Ono M, Kitakaze M, et al; on behalf of the Japanese Circulation Society and the Japanese Heart Failure Society Joint Working Group. JCS 2017/JHFS 2017 guideline on diagnosis and treatment of acute and chronic heart failure: Digest version. Circ J 2019; 83: 2084–2184.
6. Cresci S, Pereira NL, Ahmad F, Byku M, De Las Fuentes L, Lanfear DE, et al. Heart failure in the era of precision medicine: A Scientific Statement from the American Heart Association. Circ Genomic Precis Med 2019; 12: 458–485.
7. Lanzer JD, Leuschner F, Kramann R, Levinson RT, Saez-Rodriguez J. Big data approaches in heart failure research. Curr Heart Fail Rep 2020; 17: 213–224.
8. Johnson KW, Torres Soto J, Glicksberg BS, Shameer K, Miotto R, Ali M, et al. Artificial intelligence in cardiology. J Am Coll Cardiol 2018; 71: 2668–2679.
9. Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, Aljaaf AJ. A systematic review on supervised and unsupervised machine learning algorithms for data science. In: Berry MW, Mohamed A, Yap BW (editors). Supervised and unspervised learning for data science. Springer Nature Switzerland, Cham, 2020; 3–21.
10. Tibshirani R, Walther G, Hastie T. GAPSTATISTICS. J R Stat Soc B 2001; 63: 411–423.
11. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 2002; 97: 611–631.
12. Seetharam K, Raina S, Sengupta PP. The role of artificial intelligence in echocardiography. Curr Cardiol Rep 2020; 22: 1–8.
13. Segar MW, Patel KV, Ayers C, Basit M, Tang WHW, Willett D, et al. Phenomapping of patients with heart failure with preserved ejection fraction using machine learning-based unsupervised cluster analysis. Eur J Heart Fail 2020; 22: 148–158.
14. Nouraei H, Rabkin SW. A new approach to the clinical subclassification of heart failure with preserved ejection fraction. Int J Cardiol 2021; 331: 138–143.
15. Ahmad T, Pencina MJ, Schulte PJ, O’Brien E, Whellan DJ, Piña IL, et al. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol 2014; 64: 1765–1774.
16. Cikes M, Sanchez-Martinez S, Claggett B, Duchateau N, Piella G, Butakoff C, et al. Machine learning-based phenogrouping in heart failure to identify responders to cardiac resynchronization therapy. Eur J Heart Fail 2019; 21: 74–85.
17. Tokodi M, Shrestha S, Bianco C, Kagiyama N, Casaclang-Verzosa G, Narula J, et al. Interpatient similarities in cardiac function: A platform for personalized cardiovascular medicine. JACC Cardiovasc Imaging 2020; 13: 1119–1132.
18. Hasegawa K, Dumas O, Hartert TV, Camargo CA. Advancing our understanding of infant bronchiolitis through phenotyping and endotyping: Clinical and molecular approaches. Expert Rev Respir Med 2016; 10: 891–899.
19. Shimada YJ, Hasegawa K, Kochav SM, Mohajer P, Jung J, Maurer MS, et al. Application of proteomics profiling for biomarker discovery in hypertrophic cardiomyopathy. J Cardiovasc Transl Res 2019; 12: 569–579.
20. Woolley RJ, Ceelen D, Ouwerkerk W, Tromp J, Figarska SM, Anker SD, et al. Machine learning based on biomarker profiles identifies distinct subgroups of heart failure with preserved ejection fraction. Eur J Heart Fail, doi:10.1002/ejhf.2144.
21. Stienen S, Ferreira JP, Kobayashi M, Preud’homme G, Dobre D, Machu JL, et al. Enhanced clinical phenotyping by mechanistic bioprofiling in heart failure with preserved ejection fraction: Insights from the MEDIA-DHF study (The Metabolic Road to Diastolic Heart Failure). Biomarkers 2020; 25: 201–211.
22. Tromp J, Ouwerkerk W, Demissei BG, Anker SD, Cleland JG, Dickstein K, et al. Novel endotypes in heart failure: Effects on guideline-directed medical therapy. Eur Heart J 2018; 39: 4269–4276.
23. Lowe R, Shirley N, Bleackley M, Dolan S, Shafee T. Transcriptomics technologies. PLoS Comput Biol 2017; 13: e1005457.
24. Bullinger L, Döhner K, Bair E, Fröhling S, Schlenk RF, Tibshirani R, et al. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med 2004; 350: 1605–1616.
25. Kittleson MM, Ye SQ, Irizarry RA, Minhas KM, Edness G, Conte JV, et al. Identification of a gene expression profile that differentiates between ischemic and nonischemic cardiomyopathy. Circulation 2004; 110: 3444–3451.
26. Liu Y, Morley M, Brandimarto J, Hannenhalli S, Hu Y, Ashley EA, et al. RNA-Seq identifies novel myocardial gene expression signatures of heart failure. Genomics 2015; 105: 83–89.
27. Viereck J, Thum T. Circulating noncoding RNAs as biomarkers of cardiovascular disease and injury. Circ Res 2017; 120: 381–399.
28. Kumarswamy R, Bauters C, Volkmann I, Maury F, Fetisch J, Holzmann A, et al. Circulating long noncoding RNA, LIPCAR, predicts survival in patients with heart failure. Circ Res 2014; 114: 1569–1575.
29. Watson CJ, Gupta SK, O’Connell E, Thum S, Glezeva N, Fendrich J, et al. MicroRNA signatures differentiate preserved from reduced ejection fraction heart failure. Eur J Heart Fail 2015; 17: 405–415.
30. Wong LL, Armugam A, Sepramaniam S, Karolina DS, Lim KY, Lim JY, et al. Circulating microRNAs in heart failure with reduced and preserved left ventricular ejection fraction. Eur J Heart Fail 2015; 17: 393–404.
31. Wong LL, Zou R, Zhou L, Lim JY, Phua DCY, Liu C, et al. Combining circulating microRNA and NT-proBNP to detect and categorize heart failure subtypes. J Am Coll Cardiol 2019; 73: 1300–1313.
32. Bayés-Genis A, Lanfear DE, de Ronde MWJ, Lupón J, Leenders JJ, Liu Z, et al. Prognostic value of circulating microRNAs on heart failure-related morbidity and mortality in two large diverse cohorts of general heart failure patients. Eur J Heart Fail 2018; 20: 67–75.
33. de Gonzalo-Calvo D, Kenneweg F, Bang C, Toro R, van der Meer RW, Rijzewijk LJ, et al. Circulating long noncoding RNAs in personalized medicine: Response to pioglitazone therapy in type 2 diabetes. J Am Coll Cardiol 2016; 68: 2914–2916.
34. Jacoby D, McKenna WJ. Genetics of inherited cardiomyopathy. Eur Heart J 2012; 33: 296–304.
35. Ho CY, Day SM, Ashley EA, Michels M, Pereira AC, Jacoby D, et al. Genotype and lifetime burden of disease in hypertrophic cardiomyopathy insights from the sarcomeric human cardiomyopathy registry (SHaRe). Circulation 2018; 138: 1387–1398.
36. Ommen SR, Mital S, Burke MA, Day SM, Deswal A, Elliott P, et al. 2020 AHA/ACC uideline for the diagnosis and treatment of patients with hypertrophic cardiomyopathy. Circulation 2020; 142: 558–631.
37. Towbin JA, McKenna WJ, Abrams DJ, Ackerman MJ, Calkins H, Darrieux FCC, et al. 2019 HRS expert consensus statement on evaluation, risk stratification, and management of arrhythmogenic cardiomyopathy. Hear Rhythm 2019; 16: e301–e372.
38. Bozkurt B, Colvin M, Cook J, Cooper LT, Deswal A, Fonarow GC, et al. Current diagnostic and treatment strategies for specific dilated cardiomyopathies: A Scientific Statement from the American Heart Association. Circulation 2016; 134: e579–e646.
39. Lindgren MP, Pirouzi Fard MN, Gustav Smith J, Sundquist J, Sundquist K, Zöller B. A Swedish nationwide adoption study of the heritability of heart failure. JAMA Cardiol 2018; 3: 703–710.
40. Shah S, Henry A, Roselli C, Lin H, Sveinbjörnsson G, Fatemifar G, et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat Commun 2020; 11: 1–12.
41. Aragam KG, Chaffin M, Levinson RT, McDermott G, Choi SH, Shoemaker MB, et al. Phenotypic refinement of heart failure in a national biobank facilitates genetic discovery. Circulation 2019; 139: 489–501.
42. del Greco MF, Pattaro C, Luchner A, Pichler I, Winkler T, Hicks AA, et al. Genome-wide association analysis and fine mapping of NT-proBNP level provide novel insight into the role of the MTHFR-CLCN6-NPPA-NPPB gene cluster. Hum Mol Genet 2011; 20: 1660–1671.
43. Yu B, Barbalic M, Brautbar A, Nambi V, Hoogeveen RC, Tang W, et al. Association of genome-wide variation with highly sensitive cardiac troponin-T Levels in European Americans and blacks: A meta-analysis from atherosclerosis risk in communities and cardiovascular health studies. Circ Cardiovasc Genet 2013; 6: 82–88.
44. Povysil G, Chazara O, Carss KJ, Deevi SVV, Wang Q, Armisen J, et al. Assessing the role of rare genetic variation in patients with heart failure. JAMA Cardiol 2020; 6: 379–386.
45. Roberts AM, Ware JS, Herman DS, Schafer S, Baksi J, Bick AG, et al. Integrated allelic, transcriptional, and phenomic dissection of the cardiac effects of titin truncations in health and disease. Sci Transl Med 2015; 7: 270ra6.
46. Haggerty CM, Damrauer SM, Levin MG, Birtwell D, Carey DJ, Golden AM, et al. Genomics-first evaluation of heart disease associated with titin-truncating variants. Circulation 2019; 140: 42–54.
47. Chen IY, Matsa E, Wu JC. Induced pluripotent stem cells: At the heart of cardiovascular precision medicine. Nat Publ Gr 2016; 13: 333–349.
48. Kitsios GD, Tangri N, Castaldi PJ, Ioannidis JPA. Laboratory mouse models for the human genome-wide associations. PLoS One 2010; 5: e13782.
49. Benincasa G, Marfella R, Della Mura N, Schiano C, Napoli C. Strengths and opportunities of network medicine in cardiovascular diseases. Circ J 2020; 84: 144–152.
50. Raita Y, Camargo CA Jr, Liang L, Hasegawa K. Leveraging “big data” in respiratory medicine: Data science, causal inference, and precision medicine. Expert Rev Respir Med, doi:10.1080/17476348.2021.1913061.
51. Verdonschot JAJ, Merlo M, Dominguez F, Wang P, Henkens MTHM, Adriaens ME, et al. Phenotypic clustering of dilated cardiomyopathy patients highlights important pathophysiological differences. Eur Heart J 2021; 42: 162–174.

Corresponding author

Register with J-STAGE for free!