2022 Volume 4 Issue 1 Pages 1-5
It is common clinical practice for physicians to refer to specific diagnostic criteria for day-to-day decision-making. In particular, whether or not to provide a particular treatment is often determined by the cutoff value of a relevant diagnostic marker. Regression discontinuity design (RDD) is a method for evaluating scenarios where intervention is determined by the certain cutoff value (e.g., threshold) of a continuous variable. RDD represents a powerful method for assessing intervention effects and outcomes. RDD is underutilized in clinical research and there are many opportunities to apply RDD in this setting. This article introduces the principles of RDD and provides examples of clinical studies that have used this design.
It is common clinical practice for physicians to refer to specific diagnostic criteria for day-to-day decision-making. In particular, whether or not to provide a particular treatment is often determined by the cutoff value of a relevant diagnostic marker. For example, if systolic blood pressure is above 140 mmHg, physicians consider initiating antihypertensive medication. However, blood pressure is a continuous variable, and a systolic blood pressure of 140 mmHg is simply a convenient threshold value; there is little difference in the risk of complications between a patient with a systolic blood pressure of 139 mmHg and a patient with a systolic blood pressure of 140 mmHg. Nevertheless, antihypertensive medication may not be prescribed to patients with a systolic blood pressure of 139 mmHg as they do not meet the arbitrary threshold of 140 mmHg. Regression discontinuity design (RDD) is a method that is utilized to evaluate situations where decisions concerning whether or not to implement treatment are determined by a certain cutoff value of a continuous variable, as demonstrated in this hypothetical example concerning blood pressure [1–3]. RDD was first used in educational psychology in 1960 [4], and since the 1990s, RDD has been widely used in the field of economics. On the other hand, there are few examples of RDD in clinical research to date [5]. However, similar decision-making that relies on a threshold value of specific test to differentiate between normal and abnormal results is common in daily clinical practice. Thus, there may be more opportunities to implement RDD in clinical research. This article introduces the principles of RDD and provides examples of studies that have used this design.
Returning to the hypothetical example of blood pressure described above, let us assume that we can collect data on patients who do or do not start antihypertensive medication because their systolic blood pressure was 139 mmHg and 140 mmHg, respectively. In both groups of patients, baseline systolic blood pressure is approximately the same; whether the blood pressure is 139 mmHg or 140 mmHg at the time of measurement can be largely attributed to chance (e.g., non-significant fluctuations). Thus, among these patients, starting antihypertensive medication is largely left up to chance, and can be considered as randomly assigned. If we follow both groups and compare outcomes such as stroke, we can estimate the effect of antihypertensive drugs.
In this example, a continuous variable (e.g., systolic blood pressure) with a threshold value (e.g., 140 mmHg) that differentiates between normal and abnormal values is called the assignment variable. To perform RDD, data are collected from a large number of patients near the threshold value of the designated assignment variable, and outcomes between the two groups (above threshold vs. below threshold) are compared [1]. In this example, the “above threshold” group would receive antihypertensive treatment, while the “below threshold” group would not. Thus, the two groups can be used to compare the effects of antihypertensive drugs. In this sort of analysis, it is critical to collect data on patients close to both sides of the threshold; if patients with values far from the threshold are included, the analysis will not make a valid comparison. For example, comparing a patient with a systolic blood pressure of 120 mmHg to a patient with a systolic blood pressure of 160 mmHg will not provide a valid estimate of the effect of antihypertensive drugs.
RDD can also be used to examine the effects of medical policies. For example, the human papillomavirus (HPV) vaccine is intended for children, and the age range for vaccination is known. HPV vaccination has been changed from routine vaccination to voluntary vaccination from a certain date. In this scenario, RDD analysis can be implemented to examine the impact of this policy change: children received HPV vaccination as a routine vaccine before the policy change, whereas after the policy change, children only received the vaccination on a voluntary basis. Thus, whether the HPV vaccine is administered on a routine or voluntary basis is dependent on the child’s birth date. Children born before the reference date are candidates for routine vaccination, while children born after the reference date are candidates for voluntary vaccination. In this scenario, whether children were born before or after the reference date is due to chance. Therefore, background information of the two groups, such as body weight, will be similar. By collecting data on children born close to the reference date, dividing them into two groups (born before reference date and born after reference date), and comparing the outcomes (in this case, rate of HPV infection or cervical cancer), we can estimate the effect of the policy change on these outcomes.
The relationship between an assignment variable and an outcome in RDD is shown in Fig. 1. When there is an actual effect of the intervention, a discontinuous change (jump) in the outcome value is observed when the assignment variable exceeds the threshold. This change in the outcome corresponds to the effect of the treatment or policy being analyzed.
When there is an actual effect of the intervention, a discontinuous change (jump) in the value of the outcome is observed when the assignment variable exceeds the threshold. This change in the outcome corresponds to the effect of the intervention, e.g., treatment or policy implementation.
RDD has an advantage over other methods in terms of estimating causal effects. For example, propensity score analyses make a strong assumption that there is no unmeasured confounding. However, there will always be unmeasured confounding factors when conducting observational research. Compared with such unrealistic assumptions, many of the assumptions in RDD can be verified from data. The following assumptions are made when implementing RDD:
Assignment Rules and Thresholds are KnownRDD assumes that the intervention is introduced when the assignment variable exceeds (or falls below) the threshold value. The “rule” for intervention and the corresponding threshold value must be clearly identified prior to performing RDD analysis. For example, in the clinical treatment of hypertension, the rule is to “initiate therapeutic intervention (e.g., antihypertensive medication) when blood pressure exceeds 140 mmHg;” therefore, the threshold value is 140 mmHg. Both the rule and threshold must be clearly specified and consistently implemented among the study population; if the threshold used to determine intervention varies from individual to individual, RDD cannot be used.
Pre-Intervention Assignment Variables cannot be ManipulatedThe pre-intervention assignment variable must not be affected by whether or not the intervention is received. For example, consider a study designed to estimate the effect of specific health guidance based on the results of abdominal circumference measurements during specific health checkups. Specific health guidance is recommended for people whose abdominal circumference is above a certain threshold. Individuals who do not want to receive specific health guidance can retract their stomach during abdominal measurement to reduce their measured abdominal circumference; thus, the assignment variable can be easily manipulated. If the assignment variables can be manipulated, RDD analysis will not accurately estimate the treatment effect. A histogram of the assignment variable can be used to identify manipulation of the assignment variable before the intervention. If there is a sharp increase or decrease in the number of people around the threshold, it can be assumed that the pre-intervention assignment variable has been manipulated (Fig. 2).
The histogram in the left panel shows an assignment variable that is continuous at the threshold, whereas the histogram in the right panel shows sharp change at the threshold, indicating that the assignment variable is not continuous.
Assume a clinical case where a patient’s blood pressure exceeds 140 mmHg and treatments for hyperlipidemia and diabetes are started simultaneously along with antihypertensive treatment. In this scenario, even if antihypertensive treatment is not given, treatment for hyperlipidemia or diabetes may affect the patient’s outcome. Therefore, it is unknown whether the estimated effect is due to antihypertensive treatment or interventions for other diseases. To consider factors other than intervention, assignment variable continuity can be verified by the assignment variable histogram. If discontinuities are observed in any factors, the estimated effect may not be valid.
RDD is classified into sharp RDD and fuzzy RDD according to the probability that subjects on both sides of the threshold of the assignment variable receive the treatment intervention. Sharp RDD refers to a situation where the assignment variable is deterministic of treatment, that is, all subjects on one side of the assignment variable receive treatment, while all subjects on the other side do not. For example, consider an RDD analysis used to evaluate the effect of the publication of health guidelines. None of the patients admitted prior to the publication of the guidelines are affected by the guideline, while all patients admitted after the publication of the guidelines are affected.
On the other hand, fuzzy RDD refers to a situation where the assignment variable probabilistically determines whether or not an individual receives the treatment intervention. In the blood pressure example described above, a patient with a blood pressure of 139 mmHg would have a low probability of receiving pharmacotherapy. A person with a blood pressure of 140 mmHg would have a high probability of receiving pharmacotherapy.
Estimation of the Treatment Effect in RDD AnalysisThe estimation of treatment effect differs between sharp RDD and fuzzy RDD. In sharp RDD, treatment effect can be estimated by comparing the two groups on both sides of the threshold. A regression model is used in the practical analyses. A simple model is:
where Yi is the outcome, T is the treatment (coded as 1 for patients with treatment and 0 for patients without treatment), Z is the assignment variable, and c is a known threshold above which treatment is initiated. When the assignment variable approaches c from below, Y = β0 because T = 0. When the assignment variable approaches c from above, Y = β0 + β1 because T = 1. Therefore, the discontinuous change in the outcome at threshold c is β1, which is the treatment effect. The estimated effect is interpreted as the effect of the treatment at the threshold.
In estimating the treatment effect in fuzzy RDD, the three assumptions of RDD can be rephrased as follows:
(i) The proportion of interventions differs depending on the assignment variable.
(ii) The assignment variable has an effect on the outcome only through the intervention.
(iii) The assignment variable is independent of the confounders.
The relationship between the assignment variable, intervention, outcome, and confounders is shown in Fig. 3. This relationship allows the use of instrumental variable analysis. The treatment effect can be estimated by a two-stage least square. For more details, please refer to “Introduction to Instrumental Variable Analysis [6].”
(1) Assignment variable is associated with treatment assignment. As a result, the proportion of treatment differs depending on the assignment variable. (2) Assignment variable does not affect the outcome and affects the outcome only through the treatment. (3) Assignment variable is independent of confounders.
RDD identifies a treatment effect for patients close to the threshold of the assignment variable. Therefore, the estimated treatment effect can only be generalizable to patients close to the threshold of the assignment variable. For example, in the case of blood pressure, treatment effect is not generalizable to patients whose blood pressure is 120 mmHg or 160 mmHg. In addition, RDD assumes that the two groups around the threshold of the assignment variable are approximately randomly assigned. It is not easy to determine an appropriate bandwidth around the threshold. For example, patients with a blood pressure of 139 mmHg and 140 mmHg would be two comparable groups. However, if the range is taken to be 120–140 mmHg and 140–160 mmHg, the two groups may not be comparable. Thus, the choice of the bandwidth of the assignment variable is an important consideration in RDD. As the bandwidth increases, statistical power increases, but bias also increases. Likewise, as the bandwidth decreases, bias decreases at the expense of statistical power [2]. Methods for selecting an optimal bandwidth have been proposed [7]. Sensitivity analyses should be conducted using double and half the size of the bandwidth. If the results of the analyses with each bandwidth are similar, it can be assumed that the results are reasonable.
This section presents examples of studies that use RDD.
“Estimating Marginal Returns to Medical Care: Evidence from At-risk Newborns” [8]In many countries, “very low birth weight” infants, defined as those born weighing less than 1,500 g at birth, receive extra medical attention such as admission to the neonatal intensive care unit. However, the cutoff birth weight of 1,500 g is a conventional threshold without strict biological rationale. Therefore, although newborn infants just below 1,500 g of birth weight are more likely to receive medical attention, infants weighing just below 1,500 g of birth weight are similar to those born weighing just above 1,500 g. Thus, this study investigated the effect of “very low birth weight” categorization on infant mortality using sharp RDD. The assignment variable was birth weight, and the cutoff was 1,500 g. Outcomes were one-year mortality and hospital costs. The results of RDD analysis demonstrated that categorization in the “very low birth weight” group decreased one-year mortality by 22% and hospital charges by 11%.
“The Early Benefits of Human Papillomavirus Vaccination on Cervical Dysplasia and Anogenital Warts” [9]HPV infection can lead to anogenital warts and cervical cancers. The HPV vaccine effectively protects against HPV infection. Ontario, Canada, established a free HPV vaccination program for all grade 8 girls in September 2007. Therefore, girls born after 1994 were eligible for free HPV vaccination under this program, while girls born before 1993 were not eligible. This study investigated the effect of the free HPV vaccination program on cervical dysplasia and anogenital warts using sharp RDD. The results demonstrated that the program resulted in a decrease of 2.32 (4.02–0.61) cases of cervical dysplasia per 1,000 girls. Cases of anogenital warts were not significantly different.
The probability of receiving the HPV vaccine was higher for girls born after 1994 and lower for girls born before 1993. This study also conducted fuzzy RDD to consider the fact that the probability of receiving the HPV vaccine varied with date of birth. While sharp RDD analysis estimated the effect of the program, fuzzy RDD was used to estimate the effect of the vaccine itself. The results demonstrated that HPV vaccination resulted in a decrease of 5.70 (9.91–1.50) cases of cervical dysplasia per 1,000 girls.
RDD is not widely used in clinical research at present. RDD can be utilized to assess situations where the initiation of treatment is determined by a certain cutoff point of a continuous variable. The two groups on each side of the threshold can be regarded as randomly assigned. RDD methods are classified as sharp RDD and fuzzy RDD according to the probability that patients on both sides of the threshold of the assignment variable receive the intervention. Fuzzy RDD is analogous to instrumental variable analysis.
None