Revised System to Evaluate Measurement of Blood Chemistry Data From the Japanese National Health and Nutrition Survey and Prefectural Health and Nutrition Surveys

Background We developed a monitoring system that uses total errors (TEs) to evaluate measurement of blood chemistry data from the National Health and Nutrition Survey (NHNS) and Prefectural Health and Nutrition Surveys (PHNS). Methods Blood chemistry data from the NHNS and PHNS were analyzed by SRL, Inc., a commercial laboratory in Tokyo, Japan. Using accuracy and precision from external and internal quality controls, TEs were calculated for 14 blood chemistry items during the period 1999–2010. The acceptable range was defined as less than the upper 80% confidence limit for the median, the unacceptable range as more than twice the cut-off value of the acceptable range, and the borderline range as the interval between the acceptable and unacceptable ranges. Results The TE upper limit for the acceptable and borderline ranges was 5.7% for total cholesterol (mg/dL), 9.9% for high-density lipoprotein cholesterol (mg/dL), 10.0% for low-density lipoprotein cholesterol (mg/dL), 10.4% for triglycerides (mg/dL), 6.6% for total protein (g/dL), 7.6% for albumin (g/dL), 10.8% for creatinine (mg/dL), 6.5% for glucose (mg/dL), 9.7% for γ-glutamyl transpeptidase (U/L), 7.7% for uric acid (mg/dL), 8.7% for urea nitrogen (mg/dL), 9.2% for aspartate aminotransferase (U/L), 9.5% for alanine aminotransferase (U/L), and 6.5% for hemoglobin A1c (%). Conclusions This monitoring system was established to assist health professionals in evaluating the continuity and comparability of NHNS and PHNS blood chemistry data among survey years and areas and to prevent biased or incorrect conclusions.


INTRODUCTION
In November every year, the Japanese Ministry of Health, Labour, and Welfare conducts the National Health and Nutrition Survey (NHNS) in 300 unit areas. In addition, some local governments conduct an independent Prefectural Health and Nutrition Survey (PHNS) of extended samples, according to the procedures used for the NHNS. All blood samples collected in the NHNS, and some blood samples obtained in the PHNS, are analyzed by SRL Inc., a commercial laboratory in Tokyo, Japan, and measurements are performed using the same analytic system. All measurement is subject to error. Errors are not always constant and can differ by survey year depending on variations in many factors, including the principles underlying the method, analytic instruments, reagents, calibrator, medical technologist, and other laboratory conditions. 1,2 Even if the external and internal quality controls used at SRL are sound, measurement errors are inevitable.
The monitoring system described in this study outlines principles that can be used by physicians and other health professionals who are interested in the continuity and comparability among survey years, or in the statistical results for components of physical examinations, in the annual NHNS and PHNS reports. Using these principles, they can determine by themselves if the results after 2011 can be used, should be used with care, or cannot be recommended for use according to the newly established TE criteria, which are based on external and internal quality controls at SRL during the 12-year period 1999-2010. The criteria for TEs were developed for use in monitoring during 2011-2015 but not for evaluating past data. Because the results of the analysis of collected data are open to the public but information on analytic errors is not, we hoped to prevent researchers from reaching biased or incorrect conclusions in their evaluations.
In 2008, we reported tentative monitoring principles that could be used to compare blood chemistry data obtained by the NHNS. 3 However, after 2008, more PHNS data became available, to allow for evaluation of local plans in Health Japan 21. In addition, the number of blood chemistry items in the NHNS varies and has tended to increase. Finally, the Metabolic Syndrome-Focused Health Checkups Program 4 in Japan began throughout the country in 2008. Due to these developments, we decided to revise the 2008 monitoring system.
External and internal quality control SRL participates in the External Quality Assessment of Clinical Laboratories (EQACL) program of the Japan Medical Association (JMA) 5 and the Lipid Standardization Program of the US Centers for Disease Control and Prevention/ Cholesterol Reference Method Laboratory Network (CDC/ CRMLN). SRL also has an internal quality control system that uses 2 concentrations of quality-control materials.

Accuracy
Regarding accuracy (%bias) in Table 2, the evaluation method described in the 2010 annual report on EQACL by the JMA 5 was as follows: (1) values that deviate by 3 SDs or more from the center are removed, the mean and SD are obtained according to the measurement method used by the laboratories that participated in the survey, and the coefficient of variation (CV) is calculated according to the measurement method; (2) measurement methods are arranged in order of increasing CV; (3) measurement methods with a high rank in at least 80% of laboratories are selected; (4) the mean of data from laboratories using the measurement methods selected in the previous step is calculated, 1-way analysis of variance is used to calculate intra-method variation (expressed as SD), and a common CV is obtained; and (5) the common CV is corrected for the report unit width and a corrected common CV is obtained. Using both the adjusted mean obtained from this iterative truncation method and measurement values obtained by SRL, %bias according to samples was calculated and the mean of multiple %bias (accuracy) was calculated as an index of systematic error. 6 Precision Regarding precision (CV%) in Table 2, SD described in the EQACL represents dispersion in all participants, not the precision of measurement by SRL. Therefore, we were given data on the assayed values for 2 concentrations of internal quality control sera that were collected during a 1-month period, including values in November every year, randomly sampled 1 measurement value/day (n = 1) for 20 days, after which we calculated CV from the mean value and SD as an index of random error. 7 Total error and relevant criteria Subsequently, TE was calculated from accuracy and precision. Regarding total error (%) in Table 2, the equation used was "accuracy (absolute value of %bias) + precision (1.96 × CV)", which is used by the US National Cholesterol Education Program (NCEP) and the Lipid Standardization Program by CDC/CRMLN. 6 The acceptable range of TE for each blood chemistry item was defined as less than the upper 80% confidence limit for the median of the 12-year period, as calculated by the nonparametric Bootstrap method (BC a method). [8][9][10] Bootstrap method analyses were conducted using SAS, version 13 (SAS Institute, Inc., Cary, NC, USA). The unacceptable range was defined as more than twice the cut-off value of the acceptable range, based on evaluation criteria adopted by the US College of American Pathologists (CAP). 11 The interval between the acceptable and unacceptable ranges was classified as the borderline range. Thus, using these TE criteria, we have created a 3-level assessment of test performance.

Use in evaluating performance in 2011
We collected the results of EQACL evaluations and SRL internal quality control data in 2011 and attempted to evaluate SRL test performance in 2011 using the proposed TE criteria.

Criteria for CDC/CRMLN lipid standardization
To evaluate lipid measurement, the following NCEP criteria were used: TC-accuracy within 3% of target value for CDC/ CRMLN reference measurement procedure, precision as CV of 3% or less, and TE of 9% or less; HDL-C-accuracy within 5% of target value, precision as CV 4% or less, and TE of 13% or less; LDL-C-accuracy within 4% of target value, precision as CV of 4% or less, and TE of 12% or less. 12 Implementation survey for PHNS In 2007, our study group surveyed prefectural governments regarding implementation of their PHNS, including dietary intake surveys and blood examination, and collected additional data on the number of blood samples they entrusted to SRL for analysis in 2011. 13 Table 1 shows annual changes in blood chemistry items measured and number of analyzed NHNS samples assayed at SRL during 1999-2010. Items measured every year since 1999 were TC, HDL-C, triglycerides, total protein, and glucose. LDL-C, albumin, creatinine, and HbA1c were recently added to these 5 items. Other items, such as γ-GT (γ-GTP), uric acid, urea nitrogen, AST (GOT), and ALT (GPT), have been measured infrequently. The average number of assayed samples in the NHNS was 4704 during 1999-2010. Table 2 shows measurement performance at SRL, based on the EQACL of the JMA. On the basis of these calculations, criteria for acceptable, borderline, and unacceptable ranges were established, as shown in the column labeled Proposed TE Criteria. 10 The upper limit of TE in the new acceptable and borderline ranges for each item was 5.7% for TC, 9.9% for HDL-C, 10.0% for LDL-C, 10.4% for triglycerides, 6.6% for total protein, 7.6% for albumin, 10.8% for creatinine, 6.5% for glucose, 9.7% for γ-GT (γ-GTP), 7.7% for uric acid, 8.7% for urea nitrogen, 9.2% for AST (GOT), 9.5% for ALT (GPT), and 6.5% for HbA 1C . Concerning the acceptable TE range, 50% of the evaluation limits (1 side) of the CAP evaluation criteria, which are widely used worldwide, was adopted and is shown as a reference in the column labeled CAP TE in Table 2. 11 TE criteria for HbA 1 c were not established in the CAP survey. Although the acceptable range for γ-GT (γ-GTP) is expressed as SD in the CAP evaluation criteria, 7.5% was used as the corresponding value.

RESULTS
A 2007 implementation survey showed that 25 (53.2%) of the 47 prefectures in Japan independently performed blood examinations. Blood examinations were entrusted to SRL by 21 of the 25 prefectures and to a local laboratory by the other 4. A total of 15 096 samples from the 21 prefectures were analyzed by SRL. This number was 3.2 times the mean sample number (4704) of the NHNS (Table 1). Additionally, according to the 2011 survey, 20 (42.6%) of the 47 prefectures performed blood examinations.
Blood examinations were entrusted to SRL by 15 of the 20 prefectures and to a local laboratory by the other 5. A total of 7063 samples from the 15 prefectures were analyzed by SRL. This number was 1.5 times the average sample number of the NHNS (Table 1). The survey of the current situation in each prefecture was not conducted systematically, and measurement items are different for each prefecture.
In 2011, urea nitrogen was not assayed in the NHNS or PHNS; thus, there was a total of 13 items. When TE was calculated for each SRL item in 2011 to establish proposed TE White circles show blood chemistry items assayed in the corresponding year. Abbreviations: HDL, high-density lipoprotein; LDL, low-density lipoprotein; γ-GT (γ-GTP), γ-glutamyl transpeptidase; AST (GOT), aspartate aminotransferase; ALT (GPT), alanine aminotransferase; HbA1c, hemoglobin A1c.

5.19
Precision (CV%) Accuracy as an index of systematic error is expressed as %bias calculated based on JMA criteria. Precision as an index of random error is expressed as CV calculated from SRL internal quality control data.
criteria, the evaluation was acceptable for 7 items (53.8%) -TC, LDL-C, triglycerides, glucose, γ-GT (γ-GTP), uric acid, and AST (GOT)-and borderline for 6 items (46.2%), namely, HDL-C, total protein, albumin, creatinine, ALT (GPT), and HbA 1 c. No item was evaluated as unacceptable ( Table 2). Table 3 shows the measurement performance of SRL for TC, HDL-C, and LDL-C, based on the criteria of the Lipid Standardization Program by CDC/CRMLN. In each standardization year, performance satisfied the CDC/CRMLN criteria for clinical laboratories.

DISCUSSION
In standardization-the most advanced system of quality control assessment-target values are obtained by using globally accepted definitive or reference measurement procedures. However, in the EQACL, measurement values are collected from all participants and, after statistical analysis, adjusted mean values are obtained and used as an index of accuracy. A similar data processing method is used in external quality control assurance programs in Western countries. 14,15 This method statistically excludes extreme outliers and misreports, which improves the reliability of adjusted mean values as indices of accuracy. Such adjusted means do not represent physicochemical accuracy, as such, but are often used for practical purposes as consensus values in clinical surveys. Consensus values are often used as a substitute for accuracy when there is no established reference method, or when a reference method exists but is not used due to its complexity or technical difficulty. In this respect, we have no objection to the use of consensus values at many laboratories, such as those derived from approximately 3000 participants in the EQACL of the JMA. 5 The sources of error in measured values include changes in: the underlying principles of the measurement method, analytic devices, sample status (fresh, frozen), reagents or reagent reactivity, calibrators and their value assignments, the skill of analytical technologists, and other laboratory conditions. 1,2,5,6 Measurement error can result in clinical examination-derived discontinuities with previously obtained results in surveys (such as retrospective case-control studies), which could markedly affect annual follow-up. In this study, we conducted detailed follow-up surveys of these factors to avoid discontinuities derived from clinical examinations. A disadvantage of using the mean value of an external quality assessment as an index of accuracy is that the method routinely used during each period has a direct influence on measurement values. For example, when an analytic method based on new measurement principles is developed and adopted at clinical laboratories, due to convenience and/or cost and time savings, changes in mean value are sometimes observed along with analytic errors.
Case 1: The routine analytic method for HDL-C changed from a precipitation method using polyanions and cations to a homogeneous method using detergent or surfactant. The new method has been adopted by many laboratories, and agerelated changes in mean HDL-C values have been reported since the switch. In this former case, changes in mean HDL-C values were observed and, as a consequence, analytic errors change. [16][17][18][19] Case 2: There has been increasing demand for more-precise creatinine analysis for people with diabetes mellitus and renal disorders, and the calibrator is changing from the old, watersoluble standard to a new serum-based reference material with high accuracy, as confirmed by gas chromatography/ isotope dilution/mass spectrometry. Additionally, in many laboratories the creatinine method has changed from the classic Jaffe method to newly developed enzymatic methods. Changes in mean creatinine values have been observed with these new methods and, inevitably, analytic errors also change. 20,21 The survey protocol agreed by the Ministry of Health, Labour, and Welfare in Japan and SRL stipulates that the same analytic system for the NHNS (BioMajesty 8060 device No. 1, JEOL Ltd.; installed in the SRL Medical Ultimate Quality Service [MUQS] Laboratory) should also be used for blood examinations that are independently entrusted by prefectures to SRL. This protocol allows PHNS and NHNS results to be monitored in the same manner and permits PHNS data to be added to NHNS. The sample numbers of the PHNS are generally larger than those of the NHNS. However, there are 2 limitations in the use of PHNS data: the measured items differ according to prefecture, and it is possible that the analytic laboratory was changed from SRL to a local laboratory or from a local laboratory to SRL. Therefore, before using PHNS results as additional data, the laboratory responsible for the results should be confirmed. In this study, only samples measured by SRL were included. In this study, on the basis of quality control results, target TE values for the subsequent 5 years were determined. Specifically, the acceptable limit was defined as the upper 80% confidence limit of TE. TE values above this limit were considered to be in the borderline or unacceptable range, and a caution was issued. The probability of including borderline or unacceptable ranges using these target values remains at 10% even if performance remains equal to that during the previous 12-year period. Assuming annual improvements in performance, approximately 50% of TE values in the subsequent 5-year period are expected to be within the acceptable range. In quality control, there are no absolute criteria for quality, and quality is improved by daily efforts to repeatedly establish and meet criteria. Our monitoring system uses past data to establish target values for a subsequent 5year period, and adjustments are made by revising target values at 5-year intervals. The system is thus compatible with the idea of quality control. The TE limit for the acceptable and borderline ranges was established for monitoring during 2011-2015, not for its application to past data. Application to the year 2011 (Table 2) confirms the suitability of the proposed TE criteria. When TE falls within the acceptable or borderline ranges, annual continuity and comparability of survey results can be regarded as satisfactory. However, when TE falls within the unacceptable range, measurement values should be used with caution.
Precision is an index of the reproducibility of measurement values obtained by a laboratory. In this study, since TE was calculated using an equation, CV was limited to a singlicate value (n = 1) in internal quality control sera for 20 days. CV was calculated from 2 types of commercially available internal quality control serum in SRL. However, if there was a difference of 10% or more in CV between the concentrations of internal quality control materials, the higher CV was used. 7 In lipid standardization by CDC/CRMLN, 12 the accuracy, precision, and TE for SRL measurements of TC, HDL-C, and LDL-C met CDC criteria (Table 3) for clinical laboratory use. Therefore, concerning these 3 lipid items, all results in the NHNS and the results in some PHNS can be compared with results in Western countries. However, only results obtained during the previous 9-year period are available for LDL-C, and it is desirable to use these results as a reference.
In conclusion, we used TE criteria to develop a revised 3-level assessment of test performance and evaluated the continuity and comparability of 14 blood chemistry items assayed at SRL for the NHNS and PHNS in Japan. To further improve reliability, TE performance criteria should be updated every 5 years.