Linkage Rate Between Data From Health Checks and Health Insurance Claims in the Japan National Database

Background Japan’s National Database (NDB) includes data on health checks and health insurance claims, is linkable using hash functions, and is available for research use. However, the linkage rate between health check and health insurance claims data has not been investigated. Methods Linkage rate was evaluated by comparing observed medical and pharmaceutical charges among health check recipients in fiscal year (FY) 2009 (N = 21 588 883) with expected charges from the same population when record linkage was complete. Using the NDB, observed charges were estimated from the first published result of linking health check recipients in FY2009 and their health insurance claims in FY2010. Expected charges were estimated by combining 3 publicly available datasets, including data from the Medical Care Benefit Survey and an ad-hoc report by the Japan Health Insurance Association. Results Only 14.9% of expected charges were linked by the NDB. The linkage rate was higher for women than for men (18.2% vs 12.4%) and for elderly adults as compared with younger adults (>25% vs <10%). Conclusions The linkage rate in the NDB was so low that any research linking health check and health insurance claims will not be reliable. Causes for the low linkage rate include differences between health check and health insurance claims data in name format (eg, insertion of a space between family and given names) and date of birth (Japanese vs Gregorian calendar). Investigation of the causes for the low linkage rate and measures for improvement are urgently needed.


INTRODUCTION
In 2008, the National Database (NDB) was created in Japan for the "development, implementation, and evaluation" of the Health Care Cost Containment Plan (HCCCP), as set forth by Section 16 of the Elderly Health Care Security Act.Data from regular health checks and guidance have been collected since fiscal year (FY) 2008, and health insurance claims data have been collected since April 2009.The NDB has grown to one of the largest databases in the world and in June 2012 encompassed approximately 5 billion health insurance claims and 66 million health check and guidance data. 1 Personally identifiable data in the NDB are irreversibly encrypted using hash functions.Because Japan does not have unique personal identifiers, 2 32-digit hash functions are generated: one from the insurer ID, beneficiary ID, date of birth, and sex, and the other from name, date of birth, and sex.By combining 2 hash functions, the NDB maximizes record linkage of health insurance claims with health check data from the same person. 2nfortunately, the use of dual hash functions is by no means complete.Mistyping of names, inclusion of a space between family and first names, and a change in insurer or beneficiary ID will result in the generation of completely different hash functions, thereby compromising the accuracy of record linkage.Indeed, the accuracy of such record linkage in the NDB has not been fully investigated.The NDB is available for research use and many research projects using the NDB are underway. 3However, as a prerequisite of scientifically sound analysis, researchers must first ensure the accuracy of record linkage.
The author evaluated the linkage rate between health check data and health insurance claims in the NDB by comparing the medical and pharmaceutical charges observed for health check recipients in FY2009, ascertained through record linkage in the NDB, with the expected charges for the same population, estimated using publicly available data.If record linkage is complete, the observed and expected charges should match or at least be similar.

Data source
Four publicly available datasets were used, all of which are available on the internet.The first 3 were used to estimate expected charges and the last one was used to estimate observed charges.
[1] Report on Health Checks and Guidance Regarding Metabolic Syndrome in FY2009 4 The FY2009 Report on Health Checks and Guidance Regarding Metabolic Syndrome compiled administrative reports from 3453 insurers.It lists the number of beneficiaries "eligible for health checks", which is defined as "beneficiaries as of April 1, 2009" and excludes those who quit in the middle of the fiscal year.For evaluation of insurer performance, the number of beneficiaries eligible for health checks is used as the denominator to calculate the percentage of health check recipients.Because insurers are held responsible only for beneficiaries eligible throughout the fiscal year, those who changed health insurance in the middle of the fiscal year are excluded from the denominator.However, in this study, the population as of October 1, 2009 was used as the denominator because the present study does not seek to evaluate insurer performance.Hence, the percentages of health check recipients reported in this study (males: 42.0%, females: 32.6%) are lower than those in the report (males: 46.5%, females: 36.4%)(Table 1).
[2] Analysis of data on health checks and medical charges in FY2008 5 The Japan Health Insurance Association (JHIA) linked health check data and health insurance claims for 11 705 320 beneficiaries aged 35 to 74 years (the total was 9 618 145 when limited to individuals aged 40-74 years) in FY2008 and compared per capita charges between health check recipients and nonrecipients by sex and 5-year age group (Table 2).
[3] Medical Care Benefit Survey, FY2011 6 The Medical Care Benefit Survey (MCBS) is a population survey of all health insurance claims submitted from May 2011 thru April 2012 and is conducted by the Japan Ministry of Health, Labour and Welfare (MHLW).The FY2011 rather than the FY2010 MCBS was used because the MCBS included sex-specific data for the first time in FY2011.Because there was no fee schedule revision between FY2010 and FY2011, the estimates of charges will not be biased.Unlike the national database, which covers only electronically submitted claims, the MCBS includes all claims, including those submitted on paper, and thus provides the best estimate of per capita charges for the entire insured population.Since the MCBS is a survey of health insurance, it does not cover claims under the Livelihood Assistance Act for the indigent population.The MCBS also does not include Seamen's Insurance, because the insurer did not submit the relevant data.In addition, some health insurance societies and mutual aid associations did not submit data and were thus excluded from the numerator and denominator.The survey report included age-specific number of beneficiaries as the denominator but no sex-specific data were available.Therefore, age-and sex-specific numbers of beneficiaries were estimated by applying sex ratios for the population as of October 1, 2011 (Table 3).
[4] Per capita medical and pharmaceutical charges for health check recipients in FY2009 7 A report submitted by the MHLW to the Seventh Meeting of the Committee on Health Checks and Guidance on February 24, 2012 used hash functions to link health check data in FY2009 and health insurance claims data in FY2010 on an individual basis and was the first published evidence of the accuracy of record linkage in the NDB.In FY2009, 21 588 883 beneficiaries (11 942 714 males and 9 646 169 females) underwent health checks.Of them, 2 685 509 beneficiaries (1 172 510 males and 1 512 999 females; 9.8% and 15.7%, respectively) were linked with FY2010 health insurance claims (medical, pharmaceutical, and diagnosis-procedure-combination [DPC]-a system of per diem payment for acute hospitals that is part of medical claims).The medical and pharmaceutical charges contained in the linked health insurance claims totaled 716 128 080 857 yen.Because the NDB contains only electronically submitted claims, the computerization rate of claims must be considered, to ensure fair comparison with the MCBS, which also contains claims submitted on paper.According to the Social Insurance Payment Fund, the computerization rate in FY2010 was 92.0% for medical claims and 99.9% for pharmaceutical claims, for an overall rate of 94.8% 8 (463 225 000 medical and 281 613 000 pharmaceutical claims were submitted electronically out of 503 627 000 medical and 281 842 000 pharmaceutical claims in FY2010).The observed charges were inflated by multiplying values by the inverse of the computerization rate (Table 4).

Statistical analysis
Accuracy of the record linkage in the NDB was evaluated by comparing (1) the observed medical and pharmaceutical charges for health check recipients in data source [4] with (2) the expected medical and pharmaceutical charges of the same population estimated from data sources [1], [2], and [3].It is expressed as c(+)/C(+) using the following notation: N: number of beneficiaries obtained from data source [3]  N(+): number of health check recipients obtained from data source [1]  N(−): number of nonrecipients (= N − N(+)) n(+): number of health check recipients whose health insurance claims were linked using hash functions obtained from data source [4]  C: medical and pharmaceutical charges of all beneficiaries obtained from data source [3]  C(+): medical and pharmaceutical charges for health check recipients C(−): medical and pharmaceutical charges for nonrecipients (= C − C(+))  Observed medical and pharmaceutical charges for health check recipients, c(+), were calculated from data source [4], using the following formula (the results were inflated by the inverse of 0.948 to adjust for computerization of claims):

Expected charges (C(+))
Expected medical and pharmaceutical charges, C(+), were estimated as: CðþÞ ¼ NðþÞ Ã PðþÞ N(+) was obtained from data source [1].P(+) had to be estimated from per capita charges for the entire population, obtained from data source [3].Because bedridden people and hospitalized patients cannot receive health checks, the per capita charges for health check recipients (P(+)) should be lower than those for nonrecipients (P(−)).
Let r denote the ratio between per capita charges for nonrecipients over recipients, which was obtained from data source [2]: Let R denote the percentage of those receiving health checks (= N(+)/N), as indicated in data source [1].Then, Using this formula, the per capita charges for health check recipients (P(+)) can be estimated.Then, C(+) is obtained as follows:

RESULTS
The results are summarized in Table 5 and Figure.
The NDB linked only 0.755 trillion yen of a total of 5.065 trillion yen actually charged for health check recipients in FY2009.Thus, in terms of charges, the NDB was able to link only 14.9% of health insurance claims.
There was an obvious sex difference: the linkage rate was higher for women than for men (18.2% vs 12.4%, respectively).In addition, there was an age difference: the linkage rate was higher for elderly adults than for younger adults.Adults aged 65 years or older had greater than 25% of their claims linked, while younger adults had less than 10% of their claims linked.

DISCUSSION
The present results were alarming.The linkage rate of 14.9% was far lower than that of the Japan Medical Data Center (JMDC) database (88.5% with 1 hash function and 98.0% with 2 hash functions combined) 9 and might bias the findings of any research linking health check and health insurance claims data.The NDB was created for the "development, implementation and evaluation" of the HCCCP, which emphasizes health care cost containment through prevention of metabolic syndrome.However, the low linkage rate of the NDB makes it incapable of fulfilling that task.The reasons for the low linkage rate and sex and age differences are not clear.One possibility is that the formats for names and dates of birth are inconsistent on the health insurance claims and health check data.A space must be  -----------------=  inserted between family and given names on health insurance claims but not in health check data.Although date of birth is recorded using the Japanese calendar for health insurance claims, it is recorded using the Gregorian calendar for health checks.
The advantage of this study is that it is based entirely on publicly available datasets, thanks to the recent availability of detailed data.One such development is the availability of per capita charges for health check recipients and nonrecipients from the Japan Health Insurance Association.The fact that male nonrecipients of health checks consume 1.46 times the charges of recipients sheds new light on the conventional wisdom that municipalities with higher health check participation have lower per capita health care charges.Another development was the Medical Care Benefit Survey, which serves as a "mirror site" of the NDB.Interestingly, different sections of the MHLW collect the same health insurance claims data based on different legal requirements. 3hese dual databases provided the author a valuable opportunity to obtain observed and expected charges by means of comparing them.
This study did have limitations, however.Although it revealed the low linkage rate of the NDB, the reasons for this low linkage remain unclear.Investigation of the low linkage rate and identification of measures for improvement are thus urgently needed.Hash function encryption is performed by the Prefectural Federations of National Health Insurance and by prefectural branches of the Social Insurance Payment Fund, using an encryption program distributed by the MHLW (not by individual health insurers).The author suspects that the encryption algorithm is flawed, although this would not fully explain the observed sex and age differences.Since hash functions are irreversible, it is not possible to investigate causes within the NDB.
A future field test involving health insurers of sufficient enrollment size may be useful.By comparing the original, personally identifiable data (health insurance claims and health check data) with the encrypted data generated by the encryption program, it would be possible to identify the causes for the low linkage rate.Once these causes are identified, the encryption algorithm should be revised, and the old data, back to April 2009, should be recollected before they are lost, as it is not too late to address the problem.

Suggestions for researchers
The NDB is available for research use, and publications based on NDB data are already appearing.However, researchers and reviewers must carefully consider the linkage rate using hash functions, as it should never be assumed that the linkage is 100%.Just as response rate is required in reporting a questionnaire survey, linkage rate should be reported when using NDB data, particularly when using data to link the same individual across time or attempting to link health check and health insurance claims data.
This study provides a method for evaluating linkage rate.Researchers who use NDB data should refer to its mirror site, the MCBS.Because the MCBS covers the same health insurance claims as the NDB (actually the coverage of the MCBS is greater because it covers health insurance claims submitted on paper), researchers should be able to compare health care charges on a sex-and age-specific basis.
As a matter of policy, researchers are prohibited from cross-linking the NDB with any other individual-level data.However, this does not preclude comparisons with or references to other publicly available aggregate data.Researchers are reminded that part of the NDB is publicly available.The Social Insurance Claims Survey has collected data directly from the NDB for hospitals and pharmacies since 2011.Pharmacy MEDIAS collects electronic pharmacy claims since 2004 but was replaced by the NDB in April 2012. 10inally, the JHIA independently provides aggregate data on health insurance claims.By comparing those publicly available aggregate data, researchers may be able to evaluate linkage rate and any potential bias related to it.

Table 1 .
Percentages of individuals receiving health checks in fiscal year 2009 Report of health check and guidance against metabolic syndrome in fiscal year 2009.Source: http://www.mhlw.go.jp/bunya/shakaihosho/iryouseido01/dl/info03_h21_03.pdf.

Table 2 .
Per capita charges for medical and pharmaceutical claims by recipients and nonrecipients of health checks (annual charges in yen) Japan Health Insurance Association: Analysis of data on health checks and medical charges in fiscal year 2008.Source: http://www.kyoukaikenpo.or.jp/~/media/Files/honbu/cat740/2506/250611/250611003.xls.

Table 3 .
Age-and sex-specific per capita medical and pharmaceutical charges for the entire population in fiscal year 2011

Table 4 .
Observed medical and pharmaceutical charges for health check recipients linked to health insurance claims

Table 5 .
Observed and expected medical and pharmaceutical charges for health check recipients