2021 Volume 3 Issue 2 Pages 37-45
In clinical epidemiological studies, many exposures and confounders are time dependent. In the presence of time-dependent confounders affected by previous exposures, the usual analytic methods may introduce biases. Marginal structural models are used to deal with time-dependent confounders and exposures. A marginal structural model is a regression model for a pseudo-population using the concept of a potential outcome. The inverse probability of treatment and censoring weighting method is used to create a pseudo-population in which the effects of baseline confounders and time-dependent confounders can be removed when estimating the causal effect of the exposure on the outcome event. If the accuracy of the weights is high, the inverse probability of treatment and censoring weighting method is reliable and the bias of the marginal structural model is small. After the weights are created, a weighted regression model is applied to calculate the treatment effect. This seminar series paper introduces time-dependent confounders, time-dependent treatments, and marginal structural models.
In observational studies with exposures or confounders that change over time, standard confounding adjustment methods may produce biased estimates. Marginal structural models (MSMs) are causal models for estimating the causal effects of time-dependent exposures on outcomes in the presence of time-dependent confounders. This seminar series paper introduces MSMs, which allow for improved adjustment of time-dependent confounders and time-dependent treatments.
The first clinical research paper using MSMs for time-dependent confounders was published in Epidemiology in 2000 [1]. The paper, which has been cited in many articles, reported the results of the Multicenter Acquired Immunodeficiency Syndrome Cohort Study. The authors estimated the causal effect of zidovudine on the survival of human immunodeficiency virus-positive men. The study included human immunodeficiency virus-positive men who did not develop acquired immunodeficiency syndrome and were not receiving any medication. Cluster of differentiation 4 (CD4) counts were measured in all patients to gauge immunocompetence. Patients with a low CD4 count were given zidovudine, an anti-human immunodeficiency virus drug that increases the CD4 count. If the CD4 count exceeded a threshold value, zidovudine was discontinued. The examined outcome was death during the observation period.
1. Time-Dependent Confounding and Time-Dependent TreatmentIn cohort studies, variables that are changeable over time and measured at many time points during the study period are called time-dependent variables. For example, if a patient has pneumonia and the inflammatory response is high, antibiotic treatment will be started. Then, physicians measure the white blood cell count and C-reactive protein many times during the treatment period. The white blood cell count and C-reactive protein values change, shifting up and down, over the clinical course. In this setting, white blood cell count and C-reactive protein are regarded as time-dependent variables. Antibiotic treatment will be discontinued if the inflammatory response improves. However, if the inflammatory response worsens, the dose of antibiotics will be increased, or the type of antibiotic will be changed. In this sense, the treatment is also time-dependent and is referred to as a time-dependent treatment. It is common in clinical practice for the value of a time-dependent variable to change because of a treatment’s effect, which results in a change in the next treatment, an end to the treatment, an increase in the dose of the treatment, or the continuation of the treatment. When a variable is a confounder and the confounder is affected by the previous treatment, this is called a time-dependent confounder.
Fig. 1 shows the relationships among time-dependent confounders, time-dependent treatments, and death (the event of interest). Fig. 1 draws a conceptual diagram for each time point (T − 1, T, and T + 1). The time-dependent confounder at time T is a confounder because its arrows extend to both the treatment and death. The confounder at time T is affected by the treatment at time T − 1, and thus an arrow (red arrow) also extends from the treatment at time T − 1 to the confounder at time T. This time-dependent confounder changes its value depending on the treatment at time T − 1 and affects the treatment at time T. In other words, the time-dependent treatment and the time-dependent confounder influence each other, and this is called treatment–confounder feedback.
The time-dependent treatment at time T − 1 affects the time-dependent confounders at time T. The time-dependent confounders at time T influence the time-dependent treatment at time T. These relationships are also true for time T + 1.
In the study published in Epidemiology in 2000 [1], CD4 count was a time-dependent confounder and zidovudine was a time-dependent treatment. Time-dependent treatments and time-dependent confounders exist in many observational studies, although many researchers may not be aware of this fact.
Example 1: Immunosuppressive drug therapy for inflammatory bowel disease and severe infection
Using the French national health insurance database, researchers investigated whether the incidence of severe infection varied with the type of immunosuppressive treatment (thiopurine alone, anti-tumor necrosis factor alone, or a combination of the two treatments) given to patients with inflammatory bowel disease [2]. The use of immunosuppressive therapy for inflammatory bowel disease is determined by inflammatory bowel disease activity. Therefore, inflammatory bowel disease activity is a time-dependent confounder, and immunosuppressant therapy is a time-dependent treatment.
Example 2: Chronic maintenance dialysis and death in older adult patients with renal failure
Using population-based laboratory and administrative data from Alberta, Canada, a study examined whether life prognosis was better with or without chronic maintenance dialysis in older adult patients with renal failure [3]. The decision to initiate chronic maintenance dialysis is based on the value of the estimated glomerular filtration rate. Estimated glomerular filtration rate is associated with mortality, and its value changes over time. Therefore, estimated glomerular filtration rate is considered a time-dependent confounder, and the initiation of maintenance dialysis is considered a time-dependent treatment. The study assumed that, after maintenance dialysis was started, it was continued permanently.
The concept of time-dependent treatment can be applied to treatments that are repeatedly interrupted and resumed after the initial start of treatment, as is the case with zidovudine for human immunodeficiency virus infection and immunosuppressive drugs for inflammatory bowel disease. The concept can also be applied to treatments that are permanently continued after they are started, such as maintenance dialysis or tracheostomy (see section 5).
2. Problems Caused by the Presence of Time-Dependent ConfoundersWe cannot adjust for time-dependent confounders with conventional statistical methods (e.g., stratified analysis, regression analysis, or propensity score matching). Applying conventional methods to data that include time-dependent confounders can lead to distorted effect estimates because of over-adjustment bias and selection bias (also known as collider-stratification bias).
(1) Over-adjustment biasOver-adjustment bias is caused because time-dependent confounders are both intermediate variables and confounders. In Fig. 2, the treatment (past exposure) at time T − 1 affects the time-dependent confounder at time T, which, in turn, affects the outcome at time T. In other words, the time-dependent variable is sandwiched between the treatment (T − 1) and death (T) and is thus in the position of an intermediate variable. In general, adjusting for the intermediate variable results in a reduced estimate of the original effect of the treatment [4, 5].
The time-dependent confounder has two characteristics: it is both an intermediate variable and a confounder. The time-dependent treatment at time T − 1 affects the time-dependent confounder at time T, which, in turn, affects the outcome at time T. The time-dependent variable is sandwiched between the time-dependent treatment (T − 1) and death (T) and is thus in the position of an intermediate variable.
Collider-stratification bias occurs when a time-dependent confounder and the outcome have a common cause. Adjusting for this in a usual way will result in an underestimate of the true effect of the treatment.
For more details, please see previous reports [4, 6].
3. How to Deal with Time-Dependent ConfoundersThe following three methods have been proposed to deal with time-dependent confounding factors: (i) inverse probability weighting (IPW) of MSMs; (ii) the parametric G-formula; and (iii) the G-estimation of structural nested models [4, 7].
As a whole, these methods are called Robin’s G methods, where the “G” stands for a generalized treatment. G methods are analytical methods that require advanced statistical knowledge and skills, and we recommend that clinicians use them in their research under the guidance of statisticians. Among these methods, IPW of MSMs is relatively easy for clinicians to understand and can be performed with commercially available statistical software. The number of studies using IPW of MSMs seems to be increasing every year, and we recommend reading the related literature on dealing with time-dependent confounders in IPW of MSMs [7].
4. Marginal Structural Models and Causal InferenceMSMs are models for causal inference and for potential outcomes. These models are also applied to adjust for time-dependent confounders [8]. “Marginal” refers to a marginal distribution of the potential outcome, and “structural” refers to a regression model for the potential outcome. In other words, MSMs models are regression models for an entire pseudo-population using the concept of a potential outcome. Causal inference, potential outcome, marginal distribution, and pseudo-population are essential concepts for understanding MSMs.
Causal inference is the process of inferring causal relationships. One of the most famous approaches to causal inference is the Rubin causal model. This approach, which was developed by Donald Rubin in the mid-1970s, includes the concept of “potential outcome.” The Rubin causal model is based on idea following idea: When we can observe outcomes for both treated and untreated individuals, we can estimate the treatment effect for the population as a whole by calculating the difference in outcomes between the treated and the untreated. However, we cannot actually know the treatment effect for those who are untreated because the outcomes of receiving the treatment and not receiving the treatment for the same person do not exist at the same time. There is only a choice between receiving the treatment and not receiving the treatment. Thus, Rubin introduced the concept of counterfactual outcomes. The potential outcome refers to both the observed and the unobserved outcomes. The counterfactual outcome reflects the unobserved outcome only.
The difference between causation and association is illustrated in Fig. 3 [9]. The diamond shape represents the original population of interest, with the treated group shown in white and the untreated group shown in gray. To establish causation, we would need to compare the case if the whole population were treated (the white diamond) with the case if the whole population were untreated (the grey diamond) (the left arrow in Fig. 3). Conversely, to establish association, we would compare the white part of the diamond with the gray part of the diamond (the right arrow in Fig. 3). In other words, causation is found by comparing the results of a treatment being given to the whole population with the results of no treatment being given, and association is the relationship found when comparing the treated and untreated groups in a population. Inferring causality from the relationships suggested by the data is called causal inference.
Because there are two outcomes (real and counterfactual) for each person, IPW of MSMs creates a pseudo-population that is twice as large as the original population. By creating a pseudo-population, the effects of time-independent confounders (patient background factors) and time-dependent confounders can be removed. It is important to note that the causal effect is the average effect at the population level, not at the individual level. Thus, it is called the “average causal effect.”
5. The Process of Inverse Probability Weighting of Marginal Structural Models and An ExampleThe actual process of IPW of MSMs can be described with the following steps: [8–12]
(1) Creation of treatment weights
(2) Stabilization
(3) Creation of censoring weights
(4) Checking the accuracy of the created weights
(5) Creating the outcome model
(6) Determination of the type of analysis
This section introduces a previous study that used IPW of MSMs as an example [13]. The study aimed to determine whether tracheostomy can reduce mortality in patients with severe burns.
Because the treatment of severe burns requires a massive infusion of fluids (5–10 liters per day), respiratory edema can occur from the first to the third day of treatment, resulting in tracheal intubation and ventilator management. The study targeted patients with severe burns who had deteriorated respiratory status. The outcome of the study was 28-day mortality, and the exposure was tracheostomy. Some patients remained orally intubated for 28 days, whereas others underwent tracheostomy before the 28th day. The researchers tried to determine which treatment strategy was better for patients with severe burns.
(1) Creation of treatment weightsThe accuracy of IPW depends on how accurately the weights are created. The researchers in this study used propensity scores to create inverse probability weights. The propensity score is the probability that each patient actually received treatment (i.e., the treatment assignment probability), and its inverse is called the inverse probability. The weights are created using the inverse of the propensity score for those who received treatment. For those who did not receive treatment, the weights are created using the inverse of (1 − the propensity score). For this reason, it is called IPW [9].
In the previous study [13], the probability of undergoing tracheostomy was calculated for days 5–28 after hospitalization using the baseline confounders and time-dependent confounders as covariates. The baseline (time-independent) confounders included age, gender, Charlson Comorbidity Index, Japan Coma Scale, burn index, facial and neck burns, and inhalation injuries. Time-dependent confounders included sedatives, opioids, vasopressors, blood product transfusions, skin surgery, and intravenous fluids. Higher amounts of intravenous fluids, sedatives, and analgesics were associated with higher likelihoods of the patient continuing on ventilator management and of the patient receiving a tracheostomy. The amounts of sedatives and analgesics can be reduced after a tracheostomy; these amounts are therefore affected by whether the patient had a tracheostomy on the previous day. Additionally, these variables are directly associated with death. Therefore, the variables are considered time-dependent confounders (Fig. 4).
Higher amounts of intravenous fluids, sedatives, and analgesics correspond to higher likelihood of receiving a tracheostomy. The amounts of sedatives and analgesics can be reduced after a tracheostomy. These amounts are therefore affected by whether the patient has a tracheostomy on the previous day. Additionally, these variables are directly associated with death. Therefore, the variables are considered time-dependent confounders.
To create the weights, the probability of receiving the treatment at each time point is calculated. Let Pt denote the probability of receiving the treatment at time T, Pt−1 denote the probability of receiving the treatment at T − 1 (the day before T), and Pt+1 denote the probability of receiving the treatment at T + 1 (the day after T). Pt, Pt−1, and Pt+1 are calculated by logistic regression analysis using baseline confounders, the history of treatment up to that time point, and the history of time-dependent confounders. In this situation, Pt, Pt−1, and Pt+1 are propensity scores. Because the treatment choice is either receiving the treatment or not receiving the treatment, the probability of not receiving the treatment at time T is (1 − Pt). Similarly, the probability of not receiving the treatment at time T − 1 is (1 − Pt−1), and the probability of not receiving the treatment at time T + 1 is (1 − Pt+1). The next step is to calculate the probabilities from the first day to the 28th day of hospitalization. If the patient receives the treatment for the first time at T + 1, the patient has not received the treatment up to that time point. Therefore, the probability of receiving the treatment at that time point is (1 − P0) × (1 − P1) × (1 − P2) × (1 − P3) × ...... × (1 − Pt−1) × (1 − Pt) × Pt+1. The inverse of this calculating formula is the “weight” at T + 1 (Fig. 5). This calculation is continued daily for each patient through the 28th day. Note that, after a patient receives a tracheostomy, it is usually continued, and the probability of treatment after tracheostomy is assumed to be 1.
To create the weights, the probability of receiving and the probability of not receiving the treatment are calculated at each time point. If the patient receives the treatment for the first time at T + 1, the patient has not received the treatment up to that time point. Therefore, the probability of receiving treatment at that time point is (1 − P0) × (1 − P1) × (1 − P2) × (1 − P3) × ...... × (1 − Pt−1) × (1 − Pt) × Pt+1. The inverse of this calculating formula is the weight at T + 1.
The probability of receiving the treatment at time T is Pt, the probability of receiving the treatment at T − 1 (the day before T) is Pt−1, the probability of receiving the treatment at T + 1 (the day after T) is Pt+1, the probability of not receiving the treatment at T is 1 − Pt, the probability of not receiving the treatment at T − 1 is 1 − Pt−1, the probability of not receiving the treatment at T + 1 is 1 − Pt+1, and P0 and 1 and 1 − P0 and 1 denote the probabilities of receiving and not receiving treatment on the admission day and on the next day, respectively.
The general formula for calculating weights is shown in Fig. 6 [11]. The probability of receiving the actual treatment at time k can be calculated by a regression model conditioned on the history of treatment up to time k − 1 and the history of time-dependent confounders up to time k. This formula means that the weights can be calculated by taking the inverse of the probabilities and multiplying them from time 0 to time t. The time-dependent confounder at the time 0, L(0), includes the baseline confounders.
There are two types of weights: unstabilized weights and stabilized weights. Unstabilized weights are calculated from the inverse probability of the actual treatment received conditional on the history of previous treatment and time-dependent confounders (Fig. 5). Conversely, stabilized weights are calculated by multiplying the numerator of the unstabilized weights by the baseline confounders (time-independent confounders) and previous treatment history (Fig. 7).
The reasons for stabilization are that it can reduce the variance, narrow the 95% confidence interval, and improve the accuracy of the weights [9]. In general, stabilized weights are recommended. Although the number of cases doubles without stabilization, with stabilization, the number of cases returns to the same (or close to the same) number as the original population. For example, if the propensity score is 0.0001, the minimal propensity score means that a patient will almost never receive the treatment. The inverse probability is 1/0.0001 = 10,000, which means that one person can have the influence of 10,000 people, and the variance becomes very large. Using stabilization, we can keep the weights within a small range [1, 8].
In the previous study [13], the unstabilized weight distribution (Table 1) had a mean of 9.7, a median of 1.1, a standard deviation of 39.7, a minimum of 1.0 (1.0 for the bottom 1% and 175.2 for the top 1%) and a maximum of 607.4. The patient with the largest weight had an influence equivalent to 607.4 people, and even an individual in the top 1% would have the weight of 175.2 people. Therefore, these particular patients would have a very significant impact on the entire cohort. Conversely, when using stabilized weights, the range narrowed to a minimum of 0.14 (0.22 for the bottom 1% and 1.67 for the top 1%) and a maximum of 3.81. Thus, the maximum weight corresponded to having the weight of 3.8 people, and the variance was small. This example illustrates the primary advantage of stabilization.
Number of observations | Mean/median | SD | Minimum | 1% | 99% | Maximum | Truncated observations | |
---|---|---|---|---|---|---|---|---|
Unstabilized treatment weight | 13,486 | 9.7/1.1 | 39.7 | 1.000 | 1.000 | 175.243 | 607.413 | — |
Stabilized treatment weight | 13,486 | 1.0/1.0 | 0.20 | 0.142 | 0.221 | 1.667 | 3.806 | — |
Stabilized censoring weight | 13,486 | 1.0/1.0 | 0.13 | 0.334 | 0.712 | 1.498 | 3.332 | — |
Final weight | 13,486 | 1.0/1.0 | 0.25 | 0.141 | 0.226 | 1.792 | 4.678 | — |
Truncation (1%–99%) | 13,486 | 1.0/1.0 | 0.17 | 0.226 | 0.226 | 1.793 | 1.793 | 269 |
Truncation (5%–95%) | 13,486 | 1.0/1.0 | 0.09 | 0.781 | 0.781 | 1.199 | 1.199 | 1,349 |
Final weight indicates (stabilized treatment weight) × (stabilized censoring weight)
standard deviation, SD
In the previous study [13], the outcome was 28-day mortality, and it is clear that patients who were discharged alive after 28 days were alive on day 28. However, for patients who were transferred to rehabilitation or to another hospital before day 28, it is unclear whether the patients were alive on day 28. In the latter cases, the patient’s follow-up ended at the time of discharge, and the patient was treated as censored. Censoring does not pose a problem if it is completely random. However, censoring generally depends on the prognosis; for example, censored patients are more likely to have an outcome event. When loss to follow-up is associated with subsequent prognosis, this is called “informative censoring.” The exclusion of cases with informative censoring from the analysis introduces selection bias. Although it is generally difficult to deal with informative censoring, we can control for the selection bias caused by informative censoring by creating censoring weights using IPW of MSMs [1, 14].
The procedure to create censoring weights is almost the same as the procedure described above for creating treatment weights. The probability of censoring at time k is obtained from the history of treatment and time-dependent confounders up to time (k − 1) and the history of being uncensored up to time (k − 1). Then, we take the inverse of the probability of censoring as the censoring weight. Censoring weights, like treatment weights, are often stabilized. The final weight is obtained by multiplying the treatment weight and the censoring weight, where “final weight” = “treatment weight” × “censoring weight.”
(4) Checking the accuracy of the created weightsIn the previous study used as an example here [13], the final weights (Table 1) had a mean of 1.0, a median of 1.0, a standard deviation of 0.25, a minimum of 0.14 (0.23 for the bottom 1% and 1.79 for the top 1%), and a maximum of 4.68. Because the mean was close to 1, the standard deviation was small, and the range was narrow, we can conclude that the weights are highly accurate.
It is also necessary to check the distribution of the created weights. Even if the distribution of weights is fair (i.e., mean close to 1, range reasonably narrow), there is a possibility that the ends of the distribution are still extremely large or extremely small. As a sensitivity analysis, trimming (trimming the ends) or truncation (replacing the values at the ends of the distribution that are smaller or larger than a particular reference value with the reference value) may improve the wideness of distribution. For example, the 99th percentile, 95th percentile, and 90th percentile can be used to change or truncate the ends of a distribution (Table 1) [10]. The selection of which percentile to use is a trade-off between bias and variance. More truncation of the two ends will correspond to narrower weight ranges, smaller variance, narrower confidence intervals, and greater possibility of erroneous effect estimation.
(5) Creating an outcome modelA pseudo-population is created that is free from the effects of the baseline confounders and time-dependent confounders by creating accurate final weights. Then, by applying the outcome model to the pseudo-population, the treatment effect (causal effect) can be calculated.
In the previous study [13], a weighted Cox regression model was applied. To take into account that the same patients (both treated and untreated) appear in IPW of MSMs, robust variance estimators were used to calculate the 95% confidence intervals.
(6) Determination of the type of analysisIn studies using IPW of MSMs, there are three types of analysis, as in randomized controlled trials: (i) intention-to-treat (ITT) analysis; (ii) as-treated analysis; and (iii) per-protocol analysis.
(i) Intention-to-treat analysis
When calculating weights, ITT analysis assumes that, after a patient starts treatment, he or she will continue to receive that treatment throughout the study period. ITT analysis is generally recommended because of its simplicity in calculating weights and adjusting for confounding factors [15–18]. In ITT analysis, we only determine the probability of treatment initiation. ITT analysis is a “conservative estimation” because it conservatively calculates the effect when there is a positive effect of treatment and it calculates the effect directly when there is no effect. More importantly, ITT analysis can satisfy the assumption that there are no unmeasured confounding factors, as long as all confounding factors at treatment initiation are measured accurately and included in the treatment weight denominator. It is important to note that the ITT effect estimates only measure the effect of treatment initiation—not the total treatment effect for a treated patient. It is possible that there are too many cases of attrition after the initiation of treatment, meaning that some patients assigned to the treatment group did not take the treatment or stopped the treatment immediately because of poor adherence. In such cases, the estimated effect from the ITT analysis will be far from the true effect of the treatment [19]. In the previous study [13], the treatment was tracheostomy; after patients underwent tracheostomy, they did not deviate from that condition for a while, so adherence was excellent.
(ii) As-treated analysis
In as-treated analyses, patients in the treatment period are regarded as the treatment group, and those out of the treatment period are regarded as the non-treatment group [15]. Thus, the analysis also includes patients whose treatment was discontinued or changed. The reliability of the estimated effect depends on whether it accurately depicts the relationship between the confounders and the multi-phase treatment in the model. The effects of time-dependent confounders often differ at the initiation of the treatment and at the ending or continuation of the treatment. In many cases, time-dependent confounding factors are not recorded when treatment is changed after initiation. Therefore, it is difficult to accurately calculate the effect of multiple treatment stages.
(iii) Per-protocol analysis
A per-protocol analysis uses data on patients who actually follow the treatment. Patients who deviate from the initial treatment are censored at the time of treatment deviation. This artificial censoring may create selection bias, and it is necessary to consider multi-stage treatment and to correct for time-dependent confounders.
In the previous study [13], ITT analysis was used. As a result of the above steps (1)–(6), the adjusted hazard ratio for 28-day mortality following tracheostomy in patients with severe burns was 0.73 (95% confidence interval: 0.39–1.34).
In clinical epidemiological studies, many exposures (treatments) are time dependent, and time-dependent confounders are influenced by past exposures (time-dependent treatments). Conventional analytical methods may cause bias because of the presence of time-dependent treatments and time-dependent confounders. G methods are used to deal with time-dependent treatments and time-dependent confounders. In particular, in MSMs, a regression model is used for the entire pseudo-population, with the concept of a potential outcome. The IPW method is used to create the pseudo-population, through which the effects of baseline confounders (time-independent confounders) and time-dependent confounders can be removed. If the accuracy of the weights is high, the IPW method is reliable, and bias in MSMs is small. After the creation of weights, a weighted regression model is applied. The outcome model applied to the pseudo-population will determine the treatment effect.