Endocrine Journal
Online ISSN : 1348-4540
Print ISSN : 0918-8959
ISSN-L : 0918-8959
REVIEW
Causal inference and machine learning in endocrine epidemiology
Kosuke Inoue
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML

2024 Volume 71 Issue 10 Pages 945-953

Details
Abstract

With the rapid development of computer science, there is an increasing demand for the use of causal inference methods and machine learning in the research of endocrine disorders and their long-term health outcomes. However, studies on the effective and appropriate applications of these approaches in real-world data and clinical settings are still limited. This review will illustrate the use of causal inference and machine learning in epidemiological research within the field of endocrinology and metabolism. It will examine each concept of causal inference and machine learning through application examples of endocrine disorders. Subsequently, the paper will discuss the integration of machine learning within the causal inference framework, including (i) the estimation of treatment effects or the causal relationship between exposure and outcomes, and (ii) the evaluation of heterogeneity in such treatment effects (or exposure-outcome causal relationship) based on individuals’ characteristics. Accurately assessing causal relationships and their heterogeneity across different individuals is crucial not only for determining effective interventions, but also for the appropriate allocation of medical resources and reducing healthcare disparities. By illustrating some application examples in endocrinology, this review aims to enhance readers’ understanding and application of causal inference and machine learning in future epidemiological studies focusing on endocrine disorders.

1.  Introduction

With the rapid development in computer science and data availability, there is increasing demand for applications of causal inference and machine learning algorithms in the epidemiological research of endocrine disorders. Although constructing a prediction model using risk factors related to endocrine disorders and their long-term health outcomes is one of the common practices in epidemiology, prediction and causal inference serve fundamentally different purposes and interpretations. For example, low levels of high-density lipoprotein cholesterol (HDL-C) have been known to be a strong predictor of cardiovascular disease (CVD) onset [1]. However, a randomized controlled trial (RCT) of a drug increasing HDL-C did not find a reduction in the onset of CVD [2], supported by a recent Mendelian randomization study (one of the causal inference methods), showing no evidence of the causal relationship between HDL-C and CVD [3]. Therefore, while HDL-C can be effective in predicting CVD onset, the benefits of interventions increasing HDL-C may be limited. Even if these factors are important when building a prediction model for endocrine disorders, they do not necessarily have a causal relationship with the disease and must be carefully interpreted.

One of the important questions in epidemiology is whether causal inference and prediction models are mutually exclusive or not. In fact, there are many methodologies that include the predictive process as an important step in causal inference and draw attention to the importance of applying highly accurate prediction models within the causal inference framework. In particular, the recent development of machine learning (ML) algorithms allows us to conduct flexible modeling which increases the expectations for the effectiveness of ML in establishing causality. Compared to conventional statistical modeling, which generally assumes the form of relationships between input and output variables (e.g., linear regression, logistic regression), ML can handle more complex interactions among these variables without such assumptions. Given the complex interplay of multi-dimensional characteristics, such as lifestyle, comorbidities, medications, and genetics, in endocrine disorders, ML has the potential to help build better prediction models, establish causal relationships between exposures and outcomes, and improve clinical practices in this field.

In this review, we will illustrate how causal inference and ML can be applied in epidemiological research of endocrinology and metabolism. We will first explain each concept of causal inference and ML, providing application examples. Then, we will analyze how ML can be used in the causal inference framework, such as in estimation of treatment effects or causal relationship between exposure and outcomes, as well as the evaluation of their heterogeneity. This review attempts to help readers understand and apply causal inference and ML in future epidemiological research of endocrine disorders.

2.  Causal inference

2.1.  What is causal inference?

In general, causal inference examines the relationship between cause and effect [4, 5]. We consider an example of statin use (statin use, X = 1; no use, X = 0) and its effect on lowering the risk of coronary artery disease onset in 5 years (event, Y = 1; no event, Y = 0). A causal diagram (called directed acyclic graph), as shown in Fig. 1, represents the hypothesis regarding the causal relationship between the variables by an arrow drawn from the causal to the outcome variable [6, 7]. We then predict the 5-year risk of coronary artery disease under two distinct scenarios: “what if all individuals of the group were exposed to the treatment” and “what if all individuals of the group were not exposed to the treatment.” These predicted outcomes are called the potential outcomes, represented by E(YX=1) and E(YX=0), respectively. Lastly, we estimate the causal effect of interest (i.e., estimand), such as causal risk difference (E(YX=1)E(YX=0)) and causal risk ratio (E(YX=1)/E(YX=0)), using these two values. When conducting causal inference in epidemiological research, the following important assumptions must be carefully considered: i) conditional exchangeability (X⫫Yx|Z) (i.e., no uncontrolled confounders), ii) positivity (P(X|Z) > 0), iii) consistency (Yx = Y if X = x), iv) no measurement error, v) no selection bias, vi) correct model settings, and vii) no interactions between individuals. For details on causal assumptions, refer to the cited research [8].

Fig. 1  An example diagram of the causal relationship between statin use and cardiovascular disease events

CVD, cardiovascular disease

2.2.  Application examples of causal inference methods in endocrine epidemiology

By applying causal inference methods, we can disentangle the complex mechanisms across symptoms and diseases from epidemiological perspectives. For instance, using causal mediation analysis, Inoue et al. quantified that approximately 45% of the detrimental effect of elevated aldosterone levels on subclinical atherosclerosis is mediated by elevated blood pressure [9]. In previous biological studies, aldosterone has been shown to increase the risk of cardiovascular events both directly and indirectly (predominantly through hypertension). However, human-based evidence delineating these pathways was lacking. Thus, by adopting the causal inference method, their study offers novel insights into the cardiovascular burden of increased aldosterone levels, emphasizing the significance of controlling aldosterone levels or reducing mineralocorticoid receptor activity while managing blood pressure itself to mitigate overall risks of cardiovascular diseases. Another study investigated how subclinical hypothyroidism and high-normal serum thyrotropin (TSH) levels contribute to mortality mediated through cardiovascular disease [10]. Using a representative U.S. cohort, this study also applied causal mediation analysis to reveal that cardiovascular disease accounted for 14.3% and 5.9% of the mortality risk associated with subclinical hypothyroidism and elevated TSH levels, respectively. These results suggest the potential clinical advantage of thyroid hormone therapy targeting mid-normal TSH levels or proactive cardiovascular disease screening in individuals with elevated TSH, which should be the subject of further investigation.

Causal inference methods also allow us to assess the robustness of the observed association between exposure and outcomes. For example, a previous meta-analysis utilized quantitative bias analysis to evaluate the association between severe hypoglycemia and the risk of cardiovascular disease in patients with type 2 diabetes, particularly focusing on the influence of uncontrolled confounding factors, such as undiagnosed comorbid severe illnesses [11]. The bias analysis found that a markedly strong correlation between comorbid severe illnesses and both severe hypoglycemia and cardiovascular disease is necessary to fully explain the observed association. Such results provide robust evidence supporting the importance of preventing severe hypoglycemia in the management of cardiovascular disease risk among type 2 diabetes patients. Given that a large sample size study can introduce “precisely wrong” results (i.e., biased estimates with statistical significance due to small variance) [12], assessing systematic biases, such as confounding bias, information bias, and selection bias, is crucial in future epidemiological studies on endocrine disorders, particularly those using electronic health records and the national database.

3.  Machine Learning

3.1.  What is machine learning?

ML is a key component of artificial intelligence which is tasked with the training of computers to make informed predictions or decisions from data. ML algorithms include, but are not limited to, penalized regression models (e.g., LASSO, ridge, elastic net), tree-based models (e.g., random forest, gradient boosting machines), and clustering algorithms. For example, penalized regression models are used to add a penalty to a regression model to prevent overfitting and perform feature selection, which is particularly useful in handling high-dimensional data. The performance of these models heavily depends on the choice of the penalty parameter, and if the penalty is too strong, this approach could potentially lead to underfitting. Tree-based models can handle non-linear relationships and are robust to outliers, providing valuable insights into feature importance. However, they can become complex, creating interpretability challenges, and may overfit if not properly tuned, requiring significant computational resources for large datasets. Unlike these supervised learning approaches, clustering is an unsupervised learning for identifying patterns or groupings in data without prior labeling. Choosing the optimal number of clusters is important to obtain valid and interpretable results. Although these algorithms have been utilized in a variety of healthcare fields, including as diagnostic tools for medical imaging, augmenting telehealth services, and contributing to drug discovery, this review focuses on the application of ML in prediction models, which is one of the most common ways of utilizing ML in endocrine epidemiology.

3.2.  ML application examples in endocrine epidemiology

To date, there have been several applications of ML in predicting endocrine disorders. Kato et al. applied six ML models (logistic regression, lasso regression, random forest, gradient-boosting machines, and SuperLearner) to predict elevated parathyroid hormone (PTH) levels from demographic, lifestyle, and biochemical data of healthy adults [13]. In a cohort of 8,208 U.S. adults, their model exhibited an area under the receiver operating characteristic curve (AUC) of approximately 0.80, with eGFR being the most influential predictor in the random forest and gradient-boosting models. Hu and Asami et al. used electronic health records and annual check-up data to build prediction models for hyperthyroidism and hypothyroidism, achieving AUCs of 0.94 and 0.91, respectively, particularly with gradient-boost models [14]. They also found that serum creatine and total cholesterol levels contributed significantly to both prediction models. In a different single-center study in Japan, Yoshihara et al. built a prediction model for Grave’s disease using an ensemble learning ML tool called “Prediction One,” which integrates neural networks and gradient-boosted machines [15]. They were able to build a model with high predictive performance (AUC 0.99), with age, serum creatinine levels, and total cholesterol levels being the three major predicting factors for Grave’s disease. Their model maintained high predictive performance (AUC 0.97) even when using only five important variables selected in their original model, increasing its applicability in clinical practices for this common thyroid disorder.

4.  Estimation of causal effects using machine learning

This section analyzes how ML can be used to estimate the treatment effect of interest in epidemiological studies on endocrine disorders.

4.1.  Applying machine learning to models for exposure variables

One approach involves using prediction models for exposure variables (Fig. 2a) [16, 17]. Specifically, the probability of exposure conditioned by Z (i.e., P(X = 1|Z)) is calculated where the binary variable of exposure is X, and a set of confounding factors (e.g., sex, age, blood pressure, smoking) is depicted by Z. The P(X = 1|Z) value is called a propensity score and is then used to estimate the effect of exposure on the outcome to control for confounding via adjustment, matching, stratification, and weighting. It is important to note that the propensity score is used to balance the confounding variable Z between exposed and non-exposed groups, but not to find Z that predicts exposure (X) more accurately. To achieve this, we need to select Z according to the expert background knowledge and clinical hypotheses, and use it to correctly formulate a prediction model for exposure variable X. In some situations, for example of small sample size with a large number of covariates, a simple regression model may not be applicable, so adopting ML helps to construct a flexible prediction model that also considers combinations of complex covariates [18]. In recent years, ML algorithms that select and model variables optimizing the balance of confounding variables for exposed and unexposed groups have also been developed [19].

Fig. 2  Steps of causal inference methods using prediction models of exposure or outcome

4.2.  Applying machine learning to models for outcome variables

Another approach involves using a prediction model for outcome variables (e.g., standardization, G-computation [20, 21]) (Fig. 2b). In this method, a prediction model for outcome variables is first built before we create a copy of data to assign all individuals to either exposed status (for one copy) or unexposed status (for another copy), respectively. Then we predict the potential outcomes under both statuses. By comparing these potential outcomes, we can estimate the effect of exposure on the outcomes, either using simple regression or more flexible ML algorithms to build the prediction model for the outcome. Previous studies have demonstrated the superior accuracy of ML to classical regression models when the relationship between the outcome variable and exposure or confounding variables is nonlinear, or when there are many interactions between exposure and confounding variables [22]. However, it is noteworthy that the extent to which ML reduces bias and dispersion in the final effect estimate varies by situation, such that the use of ML solely does not guarantee unbiased estimates even when it improves the performance of outcome prediction.

4.3.  Doubly Robust Estimator/Targeted Maximum Likelihood Estimator

The propensity score and standardization formula/G-computation mentioned above require correctly specified models for the exposure or the outcome. Combining these two approaches, called doubly robust estimator, results in estimating an unbiased causal effect if at least one of them is set correctly [23]. A commonly used doubly robust estimator can be calculated by the following equation, combining propensity score weighting and outcome predictive value in G-computation:

  
D R E 1 ̂ D R E 0 ̂ = 1 n i = 1 n ( X i ( Y i Y 1 ı ̂ ) P S ı ̂ + Y 1 ı ̂ ) 1 n i = 1 n ( ( 1 X i ) ( Y i Y 0 ı ̂ ) 1 P S ı ̂ + Y 0 ı ̂ )

where DREâ is the doubly robust estimator of E[YX = a], n is the sample size, Xi is the observed value of X for individual i, Yi is the observed value of Y for individual i, PSı̂ is the predicted propensity score for individual i, and Yaı̂ is the predicted outcome value for individual i when NewX = a. More recently, the targeted maximum likelihood estimator, an approach utilizing ML to the doubly robust estimator, has been another widely applied method in epidemiological research [24].

4.4.  Application examples in endocrine epidemiology

A recent study applied a high-dimensional propensity score as an exposure variable, aiming to estimate the influence of levothyroxine initiation on pregnancy loss in women with subclinical hypothyroidism [25]. Their model incorporated 200 empirically selected and 33 predetermined variables within a cohort of 821 women, 181 of whom initiated levothyroxine therapy and 640 who did not. The results showed that levothyroxine’s initiation for subclinical hypothyroidism was associated with a lower risk of pregnancy loss, although the wide confidence intervals hinder definitive conclusions (the adjusted hazard ratio [95%CI] 0.87 [0.22 to 1.56]). In another example of population-based cohort study, the authors investigated the association between low HbA1c levels and mortality among U.S. adults, employing ensemble ML algorithms for predicting outcome variables in the G-computation algorithm to adjust for 72 confounders, including demographics, lifestyle choices, biomarkers, comorbidities, and medications [26]. The analysis revealed that, in comparison to mid-level HbA1c, low HbA1c levels were associated with increased all-cause mortality risks of 30% and 12% at 5 and 10 years, respectively. Their findings underscore the critical need for vigilant monitoring of both low and elevated HbA1c levels.

5.  Evaluation of effect heterogeneity using machine learning

This section explains how ML can be used to assess the heterogeneity in the treatment effect of interest in epidemiological studies for endocrine disorders.

5.1.  Machine learning algorithm to assess heterogeneity

ML can also be effectively used within the causal inference framework when evaluating the heterogeneity in the effect of interest, also known as the heterogeneous treatment effect. Recently, the heterogeneous treatment effect has received substantial attention because the response to the treatment and vulnerability to the exposure can vary by individuals. To date, a range of ML algorithms have been developed to assess heterogeneity considering multi-dimensional interactions across individuals’ characteristics. One such widely used approach in the healthcare literature is the causal forest [2729], which employs tree-based algorithms to assess heterogeneous treatment effects via estimating the conditional average treatment effect (CATE) [2931]. The CATE is represented as E[YX=1YX=0 | C], where C is a set of variables contributing to heterogeneity. By conditioning C, we can consider CATE as the treatment effect at the individual level (at least in the smallest group unit defined by C). Fig. 3 shows an example of a causal tree in which the variance of the estimator within the category (“leaf”) is minimized as much as possible while sequentially searching for leaves that maximize the “difference” in risk difference by dividing the cases according to individuals’ characteristics. In the causal forest, we create many causal trees to obtain CATE by aggregating the outputs from each tree.

Fig. 3  Example of causal tree

CVD, cardiovascular disease; RD, risk difference

5.2.  Application example in endocrine epidemiology

Researchers examining the causal forest model on the Look AHEAD trial data sought to identify a subgroup of individuals who benefit from weight loss interventions to prevent CVD morbidity and mortality among people with type 2 diabetes [32]. Their study concluded that, on average, there was no significant reduction in cardiovascular events. However, participants with either moderately or poorly controlled diabetes (HbA1c ≥6.8%) and those with well-controlled diabetes (HbA1c <6.8%) in good self-reported health (constituting 85% of participants) benefited from the intervention, reducing cardiovascular events. In contrast, 15% of participants with well-controlled diabetes and poor self-reported health saw adverse outcomes, offsetting the overall positive effects. These outcomes suggest that patient selection based on HbA1c levels and general health assessments may be helpful for the success of lifestyle interventions in diabetes management. Another population-based study employing the causal forest model examined whether the 10-year CVD risk due to coronary artery calcification (CAC), one of the major markers of subclinical atherosclerosis, varies across individuals [33]. In their analysis, they found heterogeneity in the association between CAC and CVD events, and nearly 70% of individuals categorized as low-risk in the current guideline exhibited a notable increase in CVD risk when they had CAC, advocating for CAC screening in such low-risk populations. These studies underscore the importance of nuanced assessment of both risk and benefit (i.e., the reduction in risk by treatment or screening) in preventing long-term adverse health outcomes.

5.3.  The world beyond effect heterogeneity evaluation: High-benefit approach for future precision medicine

Lastly, we introduce a new framework of treatment strategies by evaluating heterogeneous treatment effects. Historically, medical interventions have prioritized patients at high risk of diseases, in what is known as the high-risk approach. For example, the current guideline by the Japan Atherosclerosis Society recommends different LDL-C management goals according to the CVD risk of each individual (i.e., high-risk approach). However, it has been unclear whether high-risk patients benefit the most from the treatment. In this context, estimating the benefit would help clinicians assess whether the treatment of interest improves outcomes for each individual based on their characteristics, leading to the prioritization of the treatment according to estimated benefits (i.e., high-benefit approach; Table 1).

Table 1 Concept of high-benefit approach using an example of LDL-C management target and statin use

CVD risk LDL-C management target (mg/dL) The benefit of statin to prevent CVD Strategies
Low Risk <160 High Benefit Initiate statin
Low Benefit Consider alternative approaches, such as lifestyle modification and other drugs
Moderate Risk <140 High Benefit Initiate statin
Low Benefit Consider alternative approaches, such as lifestyle modification and other drugs
High Risk <120 High Benefit Initiate statin
Low Benefit Consider alternative approaches, such as lifestyle modification and other drugs

CVD, cardiovascular disease; LDL-C, low density lipoprotein cholesterol

In a recent study, Inoue, Athey, and Tsugawa applied the causal forest method to the existing RCTs, Systolic Blood Pressure Intervention Trial (SPRINT) [34], and Action to Control Cardiovascular Risk in Diabetes (ACCORD) Blood Pressure Trial [35], and examined the individual characteristics of participants who are expected to benefit most from intensive blood pressure control in reducing the risk of cardiovascular events [36]. Then, the researchers compared the performance of the treatment strategies, focusing on patients who are expected to benefit from the treatment based on the causal forest model (high-benefit approach) and those focusing on high-risk patients based on the traditional cardiovascular risk factors and risk score (e.g., systolic blood pressure and risk score calculated using the 2013 pooled cohort equation by the American College of Cardiology and American Heart Association; high-risk approach) (Fig. 4). In this analysis, they found that the high-benefit approach outperformed the high-risk approach with around five times higher treatment effect to prevent cardiovascular events in the targeted population.

Fig. 4  High-Benefit Approach vs. High-Risk Approach

The high-benefit approach targets individuals with high benefit (blue box) while the high-risk approach targets individuals with high risk (red box).

Identifying the most beneficent treatment group is an important step in determining how to allocate limited resources. Furthermore, even if the treatment is shown to be effective on average, patients at high risk but with low treatment benefit may not be able to fully benefit from that specific treatment. This raises a concern that health disparity may worsen even with an “evidence-based” treatment without understanding the heterogeneity of the effect thoroughly. Therefore, it is important to identify such population groups with high-risk but low-benefit, and explore the underlying reasons for low-benefit as well as the potential alternative approaches to reduce health disparities.

6.  Conclusions

This paper provided an overview of the concepts and applications of causal inference and ML in epidemiological research in the field of endocrinology. We analyzed ML application within the causal inference framework, focusing on estimating the average treatment effect and evaluating its heterogeneity. Although these novel approaches have several statistical advantages over conventional approaches and have been widely used in clinical research, it is important to note that ML is not a magical tool and often requires large datasets to improve its performance. Additionally, when investigating causality, several key assumptions are necessary, such as no unmeasured confounders (conditional exchangeability), regardless of whether using ML or conventional statistical modeling. Further research is necessary to apply ML results to real-world and clinical settings for better clinical practice and patient care in future precision medicine.

 Statements & Declarations

 Funding

This study was supported by the Japan Endocrine Society (JES) Grant for Promising Investigator. KI was also supported by grants from the Japan Society for the Promotion of Science (22K17392, 23KK0240), the Japan Agency for Medical Research and Development (AMED; JP22rea522107), the Japan Science and Technology (JST PRESTO; JPMJPR23R2), the Japan Health Insurance Association, and the Program for the Development of Next-generation Leading Scientists with Global Insight (L-INSIGHT), sponsored by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. Study sponsors were not involved in the study design, data interpretation, writing, or the decision to submit the article for publication.

 Competing Interests

The author has no competing interests to disclose.

 Author Contributions

The author contributed to the design, writing, and approval of the manuscript. KI is the guarantor and supervised the study.

References
 
© The Japan Endocrine Society

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top