2020 Volume 67 Issue 2 Pages 113-123
Thyroglobulin measurement in the needle washout after fine-needle aspiration (FNA-Tg) served as an important measurement for suspicious recurrent or metastatic lesions. We conducted a pooled analysis to evaluate the diagnostic accuracy of FNA-Tg and searched electronic databases for original articles in English from 1993 through 2017. Finally, a total of 22 studies containing 2,670 lymph nodes (LNs) that enrolled participants with suspicious neck LNs during thyroid nodule workup or papillary thyroid cancer (PTC) follow-up were included. In our analysis, the overall pooled sensitivity for FNA-Tg was 0.91 (95%CI: 0.87–0.93), specificity was 0.94 (95% CI: 0.91–0.96). Meta regression revealed that the cutoff value and status of serum Tg were sources of heterogeneity for sensitivity, and the cutoff value was source of heterogeneity for specificity. Additionally, the cutoff value and status of serum Tg were sources of heterogeneity in the joint model. Subgroup analysis about cut-off value showed that the choice of 1 ng/mL had highest sensitivity, 40 ng/mL had highest specificity. At last, we arrived at the conclusion that FNA-Tg measurement had high specificity and sensitivity in the early detection of LNs metastases from PTC by our meta-analysis. The technique was simple and could be recommended to apply in any FNA facility, especially when LN were small-sized. Significantly, a better standardization of criteria for FNA-Tg detection and cutoff value was required to provide useful data and to improve management of PTC patients in the future.
PAPILLARY THYROID CANCER (PTC) is the most common histological type of thyroid cancer and has an excellent long-term survival [1]. PTC is metastasized to regional lymph nodes (LNs) in 30%–80% of patients at initial diagnosis [2]. Consequently, it is important to distinguish benign reactive lymphadenitis from lymph node metastasis, aiming to avoid over-treatments and make diagnostic procedures expedient.
Nowadays, neck ultrasonography (US), as well as ultrasound-guided Fine Needle Aspiration Biopsy (US-FNAB), is a standard diagnostic modality by the guidelines to detect early thyroid carcinoma and suspicious lymph nodes (LNs) metastasis. US-FNAB should also be used to evaluate cervical LNs in patients with thyroid carcinoma after total thyroidectomy during follow-up [3]. Although the neck ultrasonographic diagnosis of node metastasis has higher accuracy, it is still hard for doctors to evaluate enlarged nodes with complicated cytological features because of the presence of granulocytes, lymphocytes, a variable amount of necrosis, multinucleated giant cells and poor epithelial cellularity, particularly in the case of cystic changes or small lymph nodes [4]. This results in a 5–10% inadequacy rate and a 6–8% false-negative rate [5].
To improve the diagnostic accuracy of US-FNAB, measuring the concentration of thyroglobulin (Tg) in the needle washout after fine-needle aspiration (FNA-Tg) has been put forward in the end of 20th century [6]. Thyroglobulin, 660 KDa molecular weight, is a specific glycoprotein produced only by PTC cells or normal thyroid follicular cells. The mean half-life of Tg is 65.2 hours, approximately equivalent to 4 weeks after surgery, Tg serum levels is detectable if normal thyroid residues and/or metastasis of differentiated thyroid carcinomas are present [7]. Therefore, measuring the concentration of Tg serum is important during PTC patients follow-up. FNA-Tg has also been performed to confirm cervical lymph nodes suspected to be metastases from PTC and numerous studies has reported that FNA-Tg improves the accuracy of FNA in detecting LN metastases, particularly in some cases of very small cervical LNs [8]. More recently, FNA-Tg serves as an important measurement for suspicious recurrent or metastatic lesions [9], but the quality of the evidence is reported to be low. The cutoff value of FNA-Tg, the sampling method and indications has no general conclusion. This meta-analysis aims to evaluate the diagnostic accuracy of FNA-Tg.
We performed a systematic search of electronic databases (PubMed, ISI Web of Science, and Scopus) from 1993 through 2017 by using keywords “washout” “aspiration” “thyroglobulin” “fine-needle”. Complete PubMed query was: (Aspiration, Fine-Needle [Mesh] AND Thyroglobulin [Mesh]) OR (thyroglobulin [Mesh] AND fine-needle aspiration) OR (Thyroglobulin [Mesh] AND aspiration) OR (Thyroglobulin [Mesh] AND washout).
The studies that enrolled participants with suspicious neck LNs during thyroid nodule workup or post-surgery PTC follow-up were included. Only original articles in English were included. Studies were considered in the meta-analysis if they were in accord with the following inclusion criterion: the absolute numbers of true-positive (TP), false-negative (FN), false-positive (FP), and true-negative (TN) test results were available or could be obtained from the available data or the authors. The exclusion criterion were as follows: (1) Meeting abstracts, reviews, or letters to the editor; (2) Case reports, editorial materials or cohort studies; (3) Insufficient data available.
Two authors (Zhu Xuhang and Sun Caixing) performed the searching and screening independently. When searching the studies, we first read the titles and abstracts. After excluding the apparently unrelated data, we further read the full text again for the final decision. Then we extracted the data including authors, publication years, nationality, sample size, TP, TN, FP, FN, cut-off value and assay of detecting TG in washout.
Quality assessmentThe methodological quality of the included studies was assessed using the QUADAS-2 tool [10]. With respect to the Cochrane guidelines, we assigned low, high, or unclear risk of bias values to the patient selection; index tests, reference standards, and item flow and timing domains were also evaluated. Applicability concerns were evaluated in the first three domains. The meaning of “Index test” was “Were the index test results interpreted without knowledge of the results of the reference standard?” The meaning of “Reference standard” was “Was the reference standard likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index test?” The meaning of “Flow and timing” was “Was there an appropriate interval between index test (s) and reference standard? Did all patients receive a reference standard? Did patients receive the same reference standard?”
Statistical analysisA bivariate regression model [11] was used to calculate the pooled sensitivity, specifcity, post-test probability (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), area under the curve (AUC) and associated 95% confidence intervals (CIs). Bivariate boxplot, Chi-square, and inconsistency index (I2) were used to assess heterogeneity; an I2 greater than 50% indicated significant heterogeneity [12]. Meta regression and subgroup analysis were also used to investigate potential sources of heterogeneity. In addition, a likelihood ratio scatter gram was used to evaluate the exclusion and confirmation capacities of the index test. Finally, clinical utility and publication bias were assessed by a Fagan diagram and Deek’s plot. STATA version 12.0 (Stata Corp, College Station, TX) was used for statistical analyses.
After the comprehensive computerized search, 2,778 references were acquired. Upon review, 1,002 duplicated references and were excluded from the initial analysis. Then 1,564 studies were excluded upon reading the title and/or abstract. 212 papers were selected and evaluated in detail with full text. 139 references had no relevant outcomes and 39 references were letters, comments or correspondences in remaining studies. After excluding these references, 2 additional papers were identified through manual reference search and 14 studies excluded for insufficient data. In total, our analysis included 22 studies, comprising 2,670 LNs that enrolled participants with suspicious neck LNs during thyroid nodule workup or PTC. The procedure of the study selection in the meta-analysis is shown in Fig. 1. 22 studies in the current meta-analysis were published between 1993 and 2017. Data came from Spain, Korea, America, Australia, Italy, France, Switzerland, Portugal, Turkey. The main method of detecting TG was immunoradiometric assay (IRMA), radioimmunoassay (RIA), chemiluminescence immunoassay (CLIA). The detail characteristic of studies were shown in Supplementary Data 1.
Flow chart of the systematic review process.
The methodological quality of the 22 studies [13-34] was assessed by using the QUADAS-2 tool. Risk of bias analysis revealed that 4 studies had high bias in patient selection, 1 studies had high bias in index tests, 2 studies had high bias in reference standard and 4 studies had high bias in flow and timing. Regarding applicability concerns, 18 studies had low bias in reference standard, 18 studies had low bias in index tests, and 13 studies had low bias in patient selection. Finally, the overall quality was good. The details of methodological quality analysis of the included studies were summarized in Table 1.
2,670 LNs were involved in our meta-analysis, and the overall pooled sensitivity of this study was 0.91 (95% CI: 0.87–0.93) (Fig. 2), the specificity was 0.94 (95%CI: 0.91–0.96) (Fig. 3). Fig. 4 showed that 6 studies were out of the circle, so significant heterogeneity existed in this meta-analysis. As shown in Fig. 5, the summary likelihood ratio positive (LRP) and likelihood ratio negative (LRN) for FNA-Tg was in the middle upper quadrant (LUQ), indicating that the FNA-Tg was a critical exclusion and confirmation method for detecting recurrent or metastatic lesions. The summary receiver operator characteristic (SROC) curve indicated that measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle aspiration had high diagnostic performance for detecting suspicious metastatic or recurrent lesions (Fig. 6); the corresponding area under the SROC curve (AUC) was 0.97 (95%CI: 0.95–0.98). As shown in Fig. 7, the clinical utility of FNA-Tg was excellent, and the post-test probability (PLR: 80%, NLR: 2%) was greater than the pretest probability (20%).
Forest plot estimating sensitivity of FNA-Tg detection in papillary thyroid cancer patients in the selected studies.
Point estimates for sensitivity and 95% CIs were shown with pooled estimates; CI, confidence interval; Q, Cochran Q statistic. FNA-Tg, Measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle used in FNA.
Forest plot estimating specificity of FNA-Tg detection in papillary thyroid cancer patients in the selected studies.
Point estimates for specificity and 95% CIs were shown with pooled estimates; CI, confidence interval; Q, Cochran Q statistic; FNA-Tg = Measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle used in FNA.
Bivariate boxplot of sensitivity and specificity in the 22 included trials.
Likelihood ratio scattergram evaluating positive likelihood ratios FNA-Tg detection in papillary thyroid cancer patients. Point estimates for positive likelihood ratio and 95% CIs were shown along with pooled estimates; FNA-Tg, Measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle used in FNA.
SROC curve for FNA-Tg detection in papillary thyroid cancer patients. AUC, area under the curve; SROC, summary receiver-operating characteristic; FNA-Tg, Measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle used in FNA.
Fagan diagram evaluating overall diagnostic value of FNA-Tg detection in papillary thyroid cancer patients. CI, confidence interval. FNA-Tg, Measuring the concentration of thyroglobulin (Tg) in the washout fluid of the needle used in FNA.
The I2 of sensitivity and specificity was 93.69 and 94.85, and the boxplot (Fig. 4) showed that heterogeneity existed in our studies. Therefore, meta-regression was used to examine potential sources of heterogeneity. Cut off value of Tg, country, time of publication, number of LNs and patients, status of serum Tg (presence or absence), assay of detecting TG in washout fluid, and dosage of normal saline were included in the meta-regression analysis of sensitivity, specificity, and the joint model.
Meta regression results were shown in Table 2 and indicated that the cutoff value and the status of serum Tg were source of heterogeneity for sensitivity (Table 2.1), and the cutoff value was source of heterogeneity for specificity (Table 2.2). Additionally, the cutoff value and the status of serum Tg were sources of heterogeneity in the joint model (Table 2.3).
Sensitivity | ||||
---|---|---|---|---|
Parameter | Estimate (95%) | Coef | Z | p > |z| |
Cutoff value | 0.84 (0.77–0.89) | 1.65 | –3.61 | 0.00 |
Number of LNs | 0.90 (0.85–0.92) | 2.19 | –0.51 | 0.61 |
Status of serum Tg | 0.81 (0.68–0.90) | 1.45 | –2.54 | 0.01 |
Number of Patients | 0.88 (0.79–0.93) | 2.01 | –0.93 | 0.35 |
Country | 0.90 (0.85–0.93) | 2.16 | –1.05 | 0.30 |
Dosage of normal saline | 0.90 (0.86–0.93) | 2.23 | –0.41 | 0.68 |
Assay of detecting TG in washout fluid | 0.90 (0.87–0.92) | 2.18 | –1.04 | 0.30 |
Specificity | ||||
---|---|---|---|---|
Parameter | Estimate (95%) | Coef | Z | p > |z| |
Cutoff value | 0.97 (0.94–0.98) | 3.48 | 2.97 | 0.00 |
Number of LNs | 0.94 (0.90–0.97) | 2.77 | –0.16 | 0.87 |
Status of serum Tg | 0.93 (0.85–0.97) | 2.66 | –0.36 | 0.72 |
Number of Patients | 0.97 (0.92–0.99) | 2.55 | –0.55 | 0.59 |
Country | 0.92 (0.89–0.95) | 3.38 | 1.63 | 0.10 |
Dosage of normal saline | 0.92 (0.89–0.96) | 2.60 | –1.46 | 0.14 |
Assay of detecting TG in washout fluid | 0.95 (0.92–0.96) | 2.90 | –0.55 | 0.58 |
Joint Model | |||
---|---|---|---|
Parameter | I-squared (95%) | LRTChi | p value |
Cutoff value | 89.5 (79.06–99.95) | 19.05 | 0.00 |
Number of LNs | 0.00 (0.00–100.00) | 0.34 | 0.84 |
Status of serum Tg | 72.46 (38.97–100.00) | 7.26 | 0.03 |
Number of Patients | 36.77 (0.00–100.00) | 3.16 | 0.21 |
Country | 66.31 (24.39–100.00) | 5.94 | 0.05 |
Dosage of normal saline |
27.97 (0.00–100.00) | 2.74 | 0.25 |
Assay of detecting TG in washout fluid |
4.31 (0.00–100.00) | 2.09 | 0.35 |
Number of LNs: ≥100 vs. <100; Status of serum Tg: Presence vs. Absence; Number of patients: ≥100 vs. <100; Country: Korea vs. other countries; assay of detecting TG in washout liquid: CLIA vs. IRMA vs. RIA; Dosage of normal saline: ≥1 mL vs. <1 mL.
Different cutoff value influenced the sensitivity and specifcity of FNA-Tg. The results of subgroup analysis were shown in Table 3. The best sensitivity was observed at cutoff 1 ng/mL (0.94; CI 0.91–0.96) while the best specificity was observed at cut-off 40 ng/mL (0.97; CI 0.94–0.99).
Parameter | Subgroup | Sensitivity | Specificity |
---|---|---|---|
Cutoff value | 1 ng/mL | 0.94 (0.91–0.96) | 0.85 (0.78–0.90) |
10 ng/mL | 0.84 (0.76–0.89) | 0.97 (0.94–0.98) | |
20 ng/mL | 0.85 (0.77–0.90) | 0.96 (0.93–0.98) | |
30 ng/mL | 0.82 (0.73–0.88) | 0.96 (0.93–0.98) | |
40 ng/mL | 0.70 (0.56–0.81) | 0.97 (0.94–0.99) |
We used Deek’s funnel plots of lnDOR against 1/ESS1/2, or, equally, against (1/n1 + 1/n2)1/2, which was proportional to 1/ESS1/2, to evaluate the accuracy of diagnostic trials [35]. The diagnostic odds ratio was plotted against 1/root (ESS) as a measure for sample size. ESS stands for effective sample size, and 1/ root (ESS) decreased with larger sample size. The value obtained from the funnel plot <0.05, indicating that publication bias in this meta-analysis was present (Fig. 8).
Deek’s funnel plot evaluating publication bias
As a major method of detecting metastatic LNs for PTC patients, fine-needle aspiration cytology (FNA-C) had been widely used to confirm suspicious findings on ultrasonography [36]. However, the sensitivity of FNA-Tg did not live up to expectations, as it varied from 75 to 85%, with a rate of false-negative results of 6–8% [6] and a rate of up to 20% of non-representative samples or samples with inadequate cellularity dependent on cytopathologists’ experience and skill [37]. This urged researchers to find better detection technologies. In 1992, Pacini [6] first propounded that FNA-Tg could be used to detect cervical LN metastasis of PTC and its sensitivity was 100%. However, the sensitivity of cytology in the diagnosis of metastatic LNs was only 85% at that time. Until recent years, many researchers drawn a conclusion that FNA-Tg had great diagnostic value in PTC patients with cervical LN metastasis and FNA-Tg measurement had been recommended by European [3] and American Guidelines [38].
Nevertheless, there was still controversy remains on this detection technology. Schuff showed that only up to 30% of the PTC patients demonstrated metastasis to the cervical LNs or recurrence with FNA-Tg [39]. The clinical assessment of enlarged local LNs was difficult at first treatment or during the follow-up because inflammatory lymphadenopathies were frequently present [14]. In addition, FNA-Tg might be higher once remnant thyroid gland was exist because serum Tg was mainly produced by the thyroid. Moon JH [30] suggested that serum Tg presence and serum TSH suppression should be considered in diagnosing LNs metastasis with FNA-Tg in PTC patients. Serum TSH suppression and serum Tg presence independently influenced the accuracy of FNA-Tg. Therefore, measuring FNA-Tg after TSH stimulation was recommended [30]. However, this conclusion lacked sufficient data and needed further investigation. Furthermore, Torres [40] suggested that the diagnostic performance of FNA-Tg was not influenced by the remnant thyroid tissue. Apart from that, some interfering factors might affect the FNA-Tg such as the shortage of methodological criteria, lack of uniform cutoff value, lack of functional sensitivity, different assay of detecting TG in washout fluid and so on. All of these factors influenced the diagnostic performance of the FNA-Tg and were bound up with large inter-assay variation bringing about difficulty of comparing between researches.
We searched more databases to obtain studies and enrolled more samples on basis of previous researches, making the result of our pooled analysis more reliable. Moreover, the QUADAS-2 tool suggested that the overall quality of the included studies was acceptable.
In our meta-analysis, high pooled sensitivity and specificity values indicated that FNA-Tg had high diagnostic accuracy in detecting neck LN metastases from PTC. Meanwhile, pooled positive likelihood ratio (PLR) and negative likelihood ratio (NLR) values further indicated high diagnostic accuracy for FNA-Tg in clinical practice. Finally, pooled DOR and AUROC values indicated that FNA-Tg had perfect discriminating ability. Although significant heterogeneity existed in our analysis, meta-regression indicated the cutoff value, the status of serum Tg were sources of heterogeneity for sensitivity, and the cutoff value was source of heterogeneity for specificity. Additionally, the cutoff value and the status of serum Tg were sources of heterogeneity in the joint model. Significantly, subgroup analysis revealed that we found the cut-off value of 1 ng/mL had highest sensitivity, 40 ng/mL had highest specifcity in our meta-analysis. Unfortunately, it was difficult to determinate a best cutoff value for FNA-Tg based on the data we collected and statistical methods.
Our meta-analysis had several potential limitations. Firstly, the enrolled studies used different cutoff value for FNA-Tg results and had many subgroups, possibly reducing the diagnostic accuracy and difficult to find a best cutoff value for all situations. Secondly, we conducted a retrospective study, and therefore, the selection bias of the studies included in this investigation should be considered in the interpretation of our results. Moreover, our studies only included English articles; potentially relevant studies in other languages were excluded. Finally, although authors independently reviewed the primary studies, complete accuracy of data could not be ensured by the strategy.
In conclusion, FNA-Tg measurement had high specificity and sensitivity in the early detection of LNs metastases from PTC by our meta-analysis. FNA-Tg could be used for screening LN metastases in patients subjected to thyroidectomy, as well as to perform LN staging in patients with PTC who had not yet undergo initial surgery. The technique was simple and could be recommended to use in any FNA facility, especially when LN were small-sized. Significantly, a better standardization of criteria for detection methods and cutoff value was required to provide useful data and to improve management of PTC patients in the future.
We thank all authors that confirmed and completed missing data from their original reports (Bournaud C, Cignarelli M, Cristina, Cunha, Giovanella L, Holmes BJ, Jeon MJ, Jeon SJ, Jung JY, Kim MJ, Lee JH, Lee MJ, Lee YH, Li QK, Moon JH, Salmashoglu A, Snozek CL, Yap NS, Zanella AB).
This work was supported by National Natural Science Foundations of China (Grant Number: 81672642) and Major projects of science and Technology Department of Zhejiang Province (Grant Number: 2015C03G1360022)
There is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.