Economic Evaluations of Gestational Diabetes Mellitus Screening: A Systematic Review

Background This study aims to find evidence of the cost-effectiveness of gestational diabetes mellitus (GDM) screening and assess the quality of current economic evaluations, which have shown different conclusions with a variation in screening methods, data sources, outcome indicators, and implementation in diverse organizational contexts. Methods Embase, Medline, Web of Science, Health Technology Assessment, database, and National Health Service Economic Evaluation Database databases were searched through June 2019. Studies on economic evaluation reporting both cost and health outcomes of GDM screening programs in English language were selected, and the quality of the studies was assessed using Drummond’s checklist. The general characteristics, main assumptions, and results of the economic evaluations were summarized. Results Our search yielded 10 eligible economic evaluations with different screening strategies compared in different settings and perspectives. The selected papers scored 81% (68–97%) on the items in Drummond’s checklist on average. In general, a screening program is cost-effective or even dominant over no screening. The one-step screening, with more cases detected, is more likely to be cost-effective than the two-step screening. Universal screening is more likely to be cost-effective than screening targeting the high-risk population. Parameters affecting cost-effectiveness include: diagnosis criteria, epidemiological characteristics of the population, efficacy of screening and treatment, and costs. Conclusions Most studies found GDM screening to be cost-effective, though uncertainties remain due to many factors. The quality assessment identified weaknesses in the economic evaluations in terms of integrating existing data, measuring costs and consequences, analyzing perspectives, and adjusting for uncertainties.


INTRODUCTION
Gestational diabetes mellitus (GDM) is defined as any degree of glucose intolerance with onset or first recognition during pregnancy. Approximately 17.8% (range, 9.3-25.5%) of pregnant women suffer complications due to GDM, depending on the epidemiological characteristics of the population investigated and diagnostic tests employed. 1 GDM has become an important public health issue and is responsible for increased risks of maternal, prenatal, and neonatal complications, such as type 2 diabetes mellitus (T2DM) and cardiovascular disease in mothers and obesity and long-term metabolic syndrome in their offspring, 2 potentially increasing the economic burden of healthcare. It is possible to manage GDM during pregnancy using nutritional management, insulin treatment, or oral hypoglycemic agent, with the primary goal of maintaining blood glucose within normal levels. Moreover, monitoring and prevention of T2DM in women with prior GDM in the postnatal period is also important in reducing the long-term disease burden. Women with GDM were found to have a higher risk of developing postpartum diabetes. 3 For the offspring, diabetes, cardiovascular alterations, and=or obesity in adulthood are the lifelong consequences of intrauterine exposure to increased glucose. 4,5 There are many studies on the economic evaluations of GDM management during both prenatal and postnatal periods. 6,7 To manage GDM, many countries have implemented a screening program to identify asymptomatic pregnant women. However, the definition of GDM, the target population, and clinical practices vary among studies. 8 GDM screening protocols are of two types and their modifications: a two-step method (a first-step glucose challenge test [GCT] and a second-step oral glucose tolerance test [OGTT]) that diagnoses based on two or more abnormal values (5.3 mmol=L while fasting, 10.0 mmol=L 1 hour postprandial, and 8.6 mmol=L 2 hours postprandial) on OGTT and a one-step method that recommends a 75 g OGTT test without a 50 g GCT before and has a simpler, one-abnormal-value diagnosis criteria. 9 From an economic evaluation perspective, different conclusions have been drawn due to the different screening methods; data sources, outcomes, and interventions vary widely across studies examining disparate systems in diverse organizational contexts. Therefore, this study aims to systematically review the evidence on the cost-effectiveness of GDM screening and perform a quality assessment.

Literature search
We conducted two independent searches of the related literature through June 2019 by Mo and Gai. We searched Embase, Medline, Web of Science, Health Technology Assessment (HTA) database, and National Health Service Economic Evaluation Database (NHSEED) for studies related to "economic evaluation of gestational diabetes screening" using the following search strings in Embase, MEDLINE, Web of Science, and NHSEED: TS=(((diabet* AND (pregnanc* OR pregnant OR gestation* OR wom?n OR female* OR mother*)) OR gdm) AND (screening* OR diagnos* OR glucose tolerance*) AND ((cost* AND (effectiveness OR benefit* OR utility)) OR (economic AND evaluation*))). In HTA, the search strings used were: (((diabet* and (pregnanc* or pregnant or gestation* or wom?n or female* or mother*)) or gdm) and (screening* or diagnos* or glucose tolerance*)). We did not select a time range for the search. All citations were imported into EndNote for further screening.

Screening of studies
The screening was conducted by Mo under the supervision of Gai. The studies were screened in three steps. First, all duplicate papers were found using EndNote; second, all the apparently relevant studies were selected by reviewing their titles and abstracts; and last, the full texts were read. The inclusion criteria were: 1) cost-effectiveness analysis, reporting both input of health resources and output of health gains; 2) studies of screening programs for detecting GDM during pregnancy among women of reproductive age; and 3) original studies involving decision modelling or other mathematical methodologies to deal with uncertainties in cost-effectiveness. The studies that only reported cost or effectiveness and did not discuss the trade-off on marginal costs or health gains were excluded (see PRISMA 2009 Checklist in eTable 1).

Quality assessment and critical appraisal
We assessed the quality of the included studies using the Assessing Economic Evaluations Checklist from the Methods for the Economic Evaluation of Health Care Programmes, 10 which contains 10 major questions on the following: answerable question posed; competing alternatives given; effectiveness of the programs or services established; costs and consequences identified; costs and consequences measured accurately, credibly, and adjusted for differential timing; incremental analysis performed; uncertainty characterized; and discussions including all issues of concern to the users. Each question contains several sub-questions. The responses available are: "Yes," "Partially yes," "No," and "Can't tell." A "Yes" is equivalent to a full score, a "No" has a value of 0, and a "Partially yes" or "Can't tell" has a value of half a point each. For each "Not Applicable" (N.A.) response, the corresponding sub-question is disregarded (eTable 2). The quality of one paper 11 was independently assessed thrice and the divergences and cases of "Partially yes" and "Can't tell" were fully discussed by Mo, Agari Takahiro, and Naito Yumi. Then, the rest of the evaluation was completed by Mo.

General characteristics of the economic evaluations
Four of the included studies 39,41,42,45 used TreeAge and three used Microsoft Excel 37,40,43 to construct a decision tree for their economic model. Their general characteristics are summarized in Table 2. The first study was published in 2002 by Poncet 45 and the next one in 2005. 44 The remaining eight studies were published between 2011 and 2017. 11,[36][37][38][39][40][41][42][43] Four evaluations were from the United States, 36,41,42,44 three were from Europe (United Kingdom, 38 Ireland, 11 and France 45 ), one was from New Zealand, 40 and the remaining two were from Asia (Singapore, 39 India, and Israel 36,37 ). Most of the studies used cost-utility analysis (CUA), where utility is measured in quality-adjusted life years (QALYs) or disability-adjusted life years (DALYs). Two studies used cost-effectiveness analysis (CEA)-one considered cases detected as the outcome, 40 while the other used prevented pregnancy complications like, macrosomia, prematurity, perinatal mortality, and hypertensive disorders as the outcome. 45 In terms of economic evaluation, five of the publications were from a healthcare perspective (third-party payer), 11,38,40,41,43 one was from the payers' perspective, 39 and two were from a societal perspective. 42,44 The remaining three studies did not clarify their perspective. 36,37,45 One study was supported by a pharmaceutical company (Novo Nordisk), 36,37 one failed to mention any funding, 42 and the others were supported by public funding.
The majority (8=10) of the selected studies included "no screening" for comparison. 11,[36][37][38][39]41,44,45 Large variations were found in the screening options, with three studies evaluating screening at different coverage rates (universal or high-risk targets), 38,39,45 while one compared screening in different settings (GP practice or hospital-based). 11 Two studies projected the longterm impact of screening on diabetes prevention. 36,37,41 Most studies used the diagnostic criteria of the International Association of Diabetes and Pregnancy Study Groups (IADPSG) released in 2010 11,36,37,[39][40][41][42] or Carpenter and Coustan (CC) 41,45 ; one used the 2008 guidelines of National Institute of Health and Clinical Excellence (NICE) 43 ; and one compared different diagnostic thresholds (NICE guidelines of 2015 and IADPSG). 38 Mo X, et al.

Main assumptions and results of economic evaluations
The major findings and sensitivity analysis results are summarized in Table 3, and the detailed input parameters of each study are presented in eTable 3 and eTable 4. As an important parameter, the GDM prevalence assumed in each study varied by area and criteria (0.016∼0.162). Most studies assumed universal screening uptake (100%) for comparison. Two studies considered the real uptake and acceptance rates, 11,40 while three also considered the option of screening the high-risk popula-

€(unclear)
Health outcomes evaluation A Review of Economic Evaluation for GDM Screening tion. 38,39,45 Only one study (two articles) simulated long-term health effects on mothers and offspring. 36,37 Most studies used the incremental cost-effectiveness ratio (ICER) to determine cost-effectiveness, with a diversified willingness to pay (WTP) threshold across study settings: £20,000 (suggested by NICE), 38,43 €20,000=45,000 (Health Information and Quality Authority (HIQA) guidelines for the Republic of Ireland), 11 $50,000, 39,44 or $100,000 41,42 (commonly referenced in American studies), 46 per-capita GDP. 36,37 (low resource countries usually refer to this threshold). 47 Compared with no screening, a screening strategy was considered dominated, 11,44 cost-effective (C-E), 37,39 or not C-E when the women were without risk factors (recommended by NICE; eg, polycystic ovary syndrome, previous stillbirth, or recurrent glycosuria) or when the GDM risk was less than 1%. 38,43 The two-step approach described here was compared to the onestep approach (2 hr OGTT at 24-28 weeks), with the execution details differing slightly among the studies ((HbA1c test at first booking +) 1 hr GCT ± 2=3 hr OGTT). Compared with the twostep approach, the IADPSG (2010) diagnostic approach (onestep) cost more, detected more cases, and proved to be C-E (under baseline consumption) 40,42 or C-E only when post-delivery care reduced diabetes incidence. 41 Regarding the comparison of NICE (2015) and the IADPSG (2010) diagnostic thresholds, the lower FPG threshold of IADPSG detected more cases and was considered C-E only under a higher WTP (£30,000 per QALY). 38 The coverage of the screening program tended to influence cost-effectiveness-universal screening or options with a higher screening uptake would be more C-E or even dominated compared with the alternative of only screening the high-risk population 41 or a population with a low uptake. 11 The GDM risk tended to affect cost-effectiveness, as well as we mentioned earlier, among women with or without lower risk factors (recommended by NICE), no screening strategy (or strict diagnostic threshold) was likely to be C-E. 38,43 Regarding uncertainties, seven studies included a one-way sensitivity analysis (SA) and three reported a two-way SA. In all, five studies presented a probabilistic SA, among which, three presented results using cost-effectiveness acceptability curves= frontier. However, no study performed the expected value of perfect information analysis. Five studies conducted a scenario SA. Of all the existing SA parameters, the most influential ones include: the uptake of screening 11,43 ; GDM prevalence 38,39,43 ; effectiveness, sensitivity, and specificity of screening 39,42 ; efficacy of treatment 39,42 ; incidence of T2DM in GDM mothers 37 ; cost and effectiveness of post-partum intervention 37,41 ; cost of screening 42 ; cost of GDM treatment 42 ; and WTP 38,39 in the respective studies.

Quality assessment and critical appraisal
The quality scores for the 10 studies shown in Table 4 demonstrate that, on average, 81% (68-97%) of the items on Drummond's checklist were addressed. Specific sub-question scores are shown in eTable 2. Most studies reported problems with Questions 3, 4, and 7. In Question 3, effectiveness based on previous randomized control trials (RCTs) and=or systematic overview required clarification. However, only two studies provided details of their search strategy and the rules for inclusion or exclusion. 41,45 For Question 4, due to differences in the analytical perspective, the relevant costs and consequences were varied. Only one paper mentioned both capital and operating costs. 42 Regarding Question 7, two papers did not consider long-term effectiveness, 40,45 and five did not include discounting. Seven of the eleven papers scored over 80%.

DISCUSSION
We reviewed the published economic evaluations of GDM screening and assessed quality in terms of options design, modelling, results, and parameters for sensitivity analysis for each paper, which were different from each other. Overall, screening is C-E or even dominant over no screening. Although the dominance of specific screening methods or targets could not be determined, recent studies have focused on screening using the 2 hr 75 g OGTT (IADPSG criteria) and compared it with no screening 11,36,37,39 or with status quo (the two-step strategy). [40][41][42] In the end, the method that results in more cases detected is likely to be C-E compared to the alternative on the conditions that postnatal care reduces diabetes incidence and that WTP increases.
The results show that the one-step screening is comparatively more C-E than the two-step 41,42 and the two-step is more C-E than the three-step. 40 With a higher WTP, the option with a low diagnostic threshold (eg, the IADPSG criteria) is more C-E than its counterpart (eg, NICE 2015). 38 A universal screening is C-E or dominant over no screening or a screening targeting the high-risk population (NICE), 11,41 where a relatively large proportion of cases were detected. Conversely, the results of economic evaluation are different when targeting low-risk population. 38,43 The dominance largely depends on the risks of the target individuals and the acceptability of the screening options. 38,43 Other than the screening protocols and diagnosis criteria under different healthcare systems, and epidemiological characteristics of GDM (GDM prevalence and mortality) in the target population, other key factors that affect cost-effectiveness of the screening include: detection efficacy, 42 long-term benefits attributable to early detection, 41 treatment efficacy, 42 and the cost of screening. 42 In particular, the consideration of long-term outcomes has a significant influence on the results, 41 which were not considered in almost all the studies examined, implying the importance of implementing effective postnatal interventions.
None of the studies compared different screening timings. Screening is usually performed at 24-28 weeks. Recent studies have suggested that GDM screening occur in the first trimester, accompanying other regular tests assessing a combination of maternal characteristics and biomarkers, 48-50 since a previous study suggested that first-trimester HbA1c alone does not have sufficient sensitivity or specificity for diagnosis. 51 Moreover, most studies were conducted in developed countries and evidence from low-income and middle-income countries is lacking.
Our review identified some methodological inconsistencies. For example, the difference between "ICER" and "CER," definition of the C-E threshold, and discount rate were not clarified. 40 Utilities and treatment effects were not clearly described either. 45 While the type of SA is not considered in the quality assessment (Q9 in the uncertainty analysis), most studies conducted a deterministic and not a probabilistic SA, even though the latter can assess the cost-effectiveness of an target option at a certain threshold 52 and characterize the combined effects of all parameter uncertainties simultaneously. 10 Our review also identified a lack of clarity in the analytical perspective, types of study design, health gains, consideration of uncertainties, and discounting in some existing studies, which if included, would have made the results more reliable. 10 Regarding reporting standards, the newly-launched guidelines for economic evaluation, such as the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement, 53 methodological guidelines proposed by NICE from the United Kingdom, 54 and the International Society for Pharmacoeconomics and Outcomes Research, 55 facilitate the creation of high-quality evidence.

Conclusions
Our review shows that the screening program for GDM during pregnancy is C-E in general. The one-step screening, with more cases detected, is more likely to be C-E than the two-step screening. Universal screening is more likely to be C-E than screening targeting high-risk population. A higher screening uptake, more effective treatment, and postnatal interventions contribute toward improving cost-effectiveness. The quality assessment identified several weaknesses in performing and reporting economic evaluations and leaves us with lessons and research tasks for the future.