Estimation of Socioeconomic Inequalities in Mortality in Japan Using National Census-linked Longitudinal Mortality Data

Background We aimed to develop census-linked longitudinal mortality data for Japan and assess their validity as a new resource for estimating socioeconomic inequalities in health. Methods Using deterministic linkage, we identified, from national censuses for 2000 and 2010 and national death records, persons and deceased persons who had unique personal identifiers (generated using sex, birth year/month, address, and marital status). For the period 2010–2015, 1,537,337 Japanese men and women aged 30–79 years (1.9% in national census) were extracted to represent the sample population. This population was weighted to adjust for confounding factors. We estimated age-standardized mortality rates (ASMRs) by education level and occupational class. The slope index of inequality (SII) and relative index inequality (RII) by educational level were calculated as inequality measures. Results The reweighted sample population’s mortality rates were somewhat higher than those of the complete registry, especially in younger age-groups and for external causes. All-cause ASMRs (per 100,000 person-years) for individuals aged 40–79 years with high, middle, and low education levels were 1,078 (95% confidence interval [CI], 1,051–1,105), 1,299 (95% CI, 1,279–1,320), and 1,670 (95% CI, 1,634–1,707) for men, and 561 (95% CI, 536–587), 601 (95% CI, 589–613), and 777 (95% CI, 745–808) for women, respectively, during 2010–2015. SII and RII by educational level increased among both sexes between 2000–2005 and 2010–2015, which indicates that mortality inequalities increased. Conclusion The developed census-linked longitudinal mortality data provide new estimates of socioeconomic inequalities in Japan that can be triangulated with estimates obtained with other methods.


INTRODUCTION
Monitoring socioeconomic inequalities in health represents an initial step towards achieving equity in society. 1,2 Socioeconomic inequalities in mortality have been assessed in most high-income countries, including European countries, [3][4][5][6][7] the United States, 8 Canada, 9 Australia, 10 New Zealand, 11 and Korea. 12 These studies, especially those examining education-based inequalities, 13 were generally conducted using national-census-linked longitudinal mortality data that covered entire populations or nationally representative populations. However, socioeconomic inequalities in mortality in Japan remain relatively understudied; this is because there is no national longitudinal mortality database that also features data regarding socioeconomic status. 14 Although studies have examined mortality inequalities using data from the Japanese national register, finding that the inequalities between Japan's occupational classes are smaller than those in European countries, [15][16][17] these studies generally applied cross-sectional approaches, which risk numerator-denominator bias. [15][16][17] Further, no cross-sectional mortality data suitable for determining mortality inequalities by education level are available for the Japanese population because educational background is not surveyed in the national death registry. A recent study used national census and death records to estimate changes in Japan's mortality inequalities by education level 18 ; however, estimating mortality rates by socioeconomic status remains limited by the available data: this previous study used only educational attainment as a socioeconomic status indicator. 18 Moreover, the study allowed 1:n matching, which distributes one death count (numerator) to N matched census cases (denominator) depending on the percentage of educational attainment averaged out by a key matching variable. This would cause systematic underestimations of mortality inequalities even if there were inequalities by educational attainment. Here, use of 1:1 matching linkage allows us to overcome this limitation.
Furthermore, there is currently no suitable national database for mortality-related socioeconomic inequalities in Japan. This especially obstructs attempts to address national mortality inequalities in accordance with measures for addressing global health inequalities. Individual linkage between census and deathrecord data might resolve this issue. Thus, this study aimed to develop census-linked longitudinal mortality data and assess its validity as a new resource. This included estimating mortality rates by socioeconomic status for the Japanese population, which would enable international comparisons. Such research could contribute useful benchmarks and entry points for monitoring and reducing socioeconomic inequalities in health.

Data sources
We used data from the Population Census (hereafter, 'the census'), conducted quinquennially by the Ministry of Internal Affairs and Communications (MIC), 19 and the National Vital Statistics (hereafter, 'death records'), collected annually by the Ministry of Health, Labour and Welfare (MHLW). 20 Anonymized microdata were extracted and used with permission from the MIC and MHLW.
We extracted from the censuses conducted on October 1, 2000, and October 1, 2010, individual data for all Japanese nationals living in Japan (denominator: person-years at risk). Regarding death records (numerator), two periods were examined: October 2000-September 2005 (wave 1), and October 2010-September 2015 (wave 2). Foreigners living in Japan were excluded.

Deterministic linkage and personal identifiers
We applied a deterministic-linkage method using 'personal identifiers' (IDs). We generated these IDs because there is no official personal identification code (eg, national security number) for linking national statistics and survey data in Japan. Each ID comprised five variables: sex, birth year, birth month, address (municipality-level local government code), and marital status (single, married, widow, divorced, or unknown). Day of birth was not surveyed in the census; exact address (eg, postcode, house number) was not available because of privacy protection. The deterministic linkage and all analyses described below were conducted for wave 1 and wave 2. eTable 1 shows the distribution of the population from the 2010 census and the deceased persons for wave 2 in terms of numbers of people with unique IDs and duplicated IDs, respectively. Figure 1 shows the deterministic-linkage procedures for wave 2. First, the population in 2010 and all deaths in wave 2 were counted, which indicated the exact mortality (hereafter, 'complete registry'). Second, we identified persons from the census and death records who had unique IDs, respectively; 1.9% of the population from the 2010 census and 886,807 deceased persons (from wave 2) were identified as having unique IDs, respectively.
Third, deterministic linkage was conducted using the individuals with unique IDs. If a person from the census was not matched to a deceased person with a unique ID, we considered him/her to have been alive at the end of the followup period. For wave 2, 64,422 men and 28,092 women were matched with persons from the census, meaning 700,877 men and 743,946 women were presumed alive at the end of September 2015. Fourth, we excluded persons who had lived in municipalities for which the local government code was deleted during the first year of the follow-up period; this was to ensure at least 1 year of follow-up. Finally, we developed census-linked longitudinal mortality data that included demographics, socioeconomic status, year and month of death, cause of death, and date of censoring due to local-government code change. Underlying causes of death were classified according to the International Statistical Classification of Diseases, 10 th Revision, and grouped into four broad groups: cancer (C00-D48) cardiovascular diseases (I00-I99), external causes (V01-Y98), and all other diseases, respectively.

Weighting
We sampled persons who had unique IDs from all Japanese people aged 30-79 years. This method assumes that, if all variables are evenly distributed across individuals, random sampling will occur. However, birth year (age), municipalities, and marital status are unevenly distributed in the Japanese population. 19 Therefore, we weighted the sampled population to adjust for confounding between the distribution of demographics and mortality. 21 In our procedure, persons who lived in municipalities with large populations were less likely to be sampled because ID duplication was more likely. Similarly, married persons were less likely to be sampled because most (approximately 70%) Japanese people aged 30-79 years were married in 2010. Therefore, when calculating mortality, married persons and people living in large municipalities should be allocated larger weights. We calculated the weighting score using ratios representing the number of population members that possessed a certain weighting key divided by the number of persons in the sample with a matching key. The weighting key (maximum: 110,920 combinations) was based on prefecture, sex, 5-year age category, marital status, education level, and occupational class (for people aged 30-64 years only). For example, suppose that 10 single men aged 30-34 years who were manual workers, lived in Tokyo, and had low education levels were observed in the census, and five men with the same demographics were observed in the sample population; a weighting score of '2' (= 10/5) would be allocated to each sampled person. The range of the weighting score was set to 1-10,000 to avoid overweighting individuals; all weighting above 10,001 was set to '10,000'. eTable 2 shows the weighting-score calculations. We generated and allocated 71,991 weighting scores for the sample population from wave 2. Lastly, the weighting scores were recalibrated to ensure that the average weight for all sample populations was equal to one, which resulted in the standard errors being approximated to those of the unweighted sample when calculating mortality.
Occupational class was classified into five categories (based on the Erikson-Goldthorpe-Portocarero scheme 24 ): upper nonmanual worker, lower non-manual worker, manual worker, farmer, and self-employed. Those labelled 'unemployed' in the census were coded as 'unemployed'. Detailed classifications are presented in eTable 3 and eTable 4. among the entire population were calculated (ie, complete registry), followed by ASMRs among the unweighted sample population and ASMRs among the reweighted sample population. The 2013 European Standard Population was used as a reference for direct standardization, because the distribution is similar to that observed in the 2000 Japanese Census. 22,25 Persons who lived in municipalities for which the local-government code was deleted between October 2011 and September 2015 were censored at the end of the prior September. To assess validity, we compared the mortality rates of the reweighted sample population with those for the complete registry.
After considering the accuracy of the unique ID and checking validity, we excluded men and women aged 30-39 years from the estimations of mortality by socioeconomic status because of overestimations among younger age-groups. Finally, we estimated ASMRs by educational level (40-79 years) and occupational class (40-64 years) for each period using the reweighted sample population.

Inequality measures
Mortality rate difference (RD) and mortality rate ratio (RR) of low versus high educational level and manual versus upper nonmanual workers were calculated to measure inequalities. We used a bootstrap procedure with 1,000 replications to calculate 95% confidence intervals (CIs). The slope index of inequality (SII) and its relative counterpart, the relative index of inequality (RII), were calculated as inequality measures for educational level. 26 Both SII and RII were adjusted by 5-year age groups. The average intergroup differences (AID) were calculated as inequality measures for occupational class because occupational class cannot be defined as hierarchically ordered. 15,27

Sample population size
The sample population was 2,553,447 (3.3% of the total population: 1,240,619 men, 1,312,828 women) in wave 1 and 1,537,337 (1.9% of the total population: 765,299 men, 772,038 women) in wave 2. The results for wave 2 were generally similar to those for wave 1. From this point forward, we mainly present the results for wave 2. Results for wave 1 are shown in, eFigure 1, eFigure 2, eTable 5, eTable 6, eTable 7, and eTable 8. Table 1 shows the distribution of populations and ASMRs for men. The reweighted sample population and complete registry showed similar distributions of demographic characteristics (eg, married mencomplete registry: 76.4%, reweighted sample population: 74.1%; men with high education levelcomplete registry: 29.8%, reweighted sample population: 30.0%). Differences in all-cause ASMRs ranged from −0.7% (75-79 years) to 82.1% (35-39 years) across the 5-year age groups. For single and married men, the ASMRs of the reweighted sample population were 9.6% lower and 9.6% higher than those of the complete registry, respectively. For men aged 40-79 years, all-cause ASMR (per 100,000 person-years) was 1,289 (95% CI, 1,287-1,290) for the complete registry and 1,373 (95% CI, 1,359-1,386) for the reweighted sample population. Among men aged 40-79 years, the reweighted sample population's ASMRs were 6.5% higher (84 per 100,000 person-years higher) than those of the complete registry.

All-cause mortality for men
All-cause mortality for women Cause-specific mortality Table 3 shows a comparison between the complete registry and sample population regarding broad cause-specific mortality among men and women aged 40-79 years. Differences in ASMRs between the complete registry and reweighted sample population were based on broad cause of death and sex. For men, the ASMRs (per 100,000 person-years) of the reweighted sample population were 45, 12, 11, and 14 higher than those of the complete registry for cancer, cardiovascular disease, external causes, and others, respectively. For women, the ASMRs (per 100,000 person-years) of the reweighted sample population were 2, 15, 6, and 11 higher than those of the complete registry for cancer, cardiovascular disease, external causes, and others, respectively. In percentage terms, for both men and women mortality from external causes showed the largest differences when compared with the complete registry.

DISCUSSION Main findings
This study is a novel attempt to estimate exact national mortality rates both by educational level and occupational class using longitudinal national census data linked with death records, which was evaluated by comparing to the complete national mortality registry in Japan. Our findings showed clear mortality differences by socioeconomic status persisted in Japan. In addition, inequality measures indicated mortality inequalities increased between 2000-2005 and 2010-2015 for men. For women, changes in the inequality indices showed the opposite directions for educational level (inequalities increased) and occupational class (inequalities reversed). Although estimates calculated through deterministic-linkage methods should be interpreted with caution, the linked mortality data presented in this study may, nevertheless, represent new estimates for assessing mortality inequalities by socioeconomic status in Japan.

Interpretations
Our estimates should be compared to a previous study that assessed the changes in educational inequalities in mortality in Japan between 2000 and 2010. 18 The results showed that men and women aged 40-75 years with primary and junior high school graduation had about 15-25% and 10-20% higher all-cause mortality, respectively, than counterparts with junior college and university graduation in 2000. 18 Their conclusions that relative mortality inequalities persisted between 2000 and 2010 were also comparable with our results, though their observation periods of death records were 6 months and the method of deterministic linkage (1:n matching) was somewhat different from ours (1:1 matching), in addition to the category of the educational attainment information. 18 Our estimates also confirmed the presence of similar inequality patterns, albeit with smaller differences in magnitude, in Japan when compared to estimates reported for other high-income countries. [3][4][5][6][7][8][9][10][11][12] Our longitudinal mortality database may facilitate between-country comparative research of education-based mortality inequalities, because the education-classification method used in our database affords easy comparisons with other high-income countries. 3,4,8 However, the generalizability of the Japanese-census-linked mortality data should be carefully considered. Our mortality database may underrepresent individuals living in large cities, as discussed below, whereas the longitudinal mortality data from other highincome countries generally cover the entire population. [3][4][5][6][7][8][9][10][11][12] The national census' missing data regarding educational attainment (wave 2: 12.1% for men and 11.4% for women aged 40-79 years) is expected for any census-linked longitudinal mortality data developed in Japan. These missing data may distort the validity of inequality estimates. For example, all-cause ASMRs (per 100,000 person-years) for individuals with low and unknown educational levels were 1,670 and 1,682 for men, and 777 and 699 for women, respectively. Even if more people with a low education level do not report their educational attainment, high amounts of missing data do not cause mortality among people with a low education level in our database to be underestimated because the estimated mortality of unknown educational level was similar and lower for men and women, respectively.
For exact estimates of mortality by occupational class, further analysis in which correction factors are applied to each worker is required. 28 This is because unemployed people's last occupation Socioeconomic Inequalities in Mortality in Japan is unknown, and workers in lower occupational classes have a higher likelihood of being unemployed. 15,28 For each study period, male upper non-manual workers had higher cancer mortality rates than male lower non-manual workers; this pattern is similar to that shown in a previous study. 15 However, we also found that male upper non-manual workers had lower mortality rates from cardiovascular disease and external causes than male lower non-manual workers, which differs from the previous study. 15 In addition, the inequality indices by occupational class (AIDs) changed in the opposite directions for our results (inequalities increased) and a previous study (inequalities decreased) between 2000-2005 and 2010-2015. 15 This discrepancy may be due to under-sampling of urban-based workers supposing that managers and professional workers in urban regions experience heavy burdens in severe work environments. However, there is no clear evidence to explain the variations in occupational mortality across regions. Furthermore, in contrast to the well-documented male mortality by occupation, [15][16][17] few studies have focused on female workers in Japan. Our findings suggest new estimates for female workers. We identified higher mortality among female upper non-manual workers than for female manual workers from each broad cause-specific death. This finding is comparable to unique male mortality inequalities by occupational class in Japan, which was confirmed using a cross-sectional design. 15 Further analysis is necessary to discuss the applicability of making comparisons using mortality calculated from cross-sectional mortality data (existing national statistics) and linked longitudinal mortality data. Estimation of socioeconomic inequalities by age group is another challenge for better understanding health inequalities. eFigure 3 shows that mortality gradients by educational level were substantial across all age groups. This figure implies relative inequalities in mortality were more prominent among the younger generation for both sexes. Because some estimates were identified as irregular (eg, women aged 65-69 in wave 2) in addition to overestimations of mortality in the younger generations, the trend is still under discussion; however, our mortality database suggests that mortality-related socioeconomic inequalities did not increase with age in Japan. This finding may contribute to the understanding of interactions between health and age socioeconomic stratification.

Limitations
There are four major limitations to using census-linked longitudinal mortality data. The first concerns the generated IDs. If all individuals retained their residence and marital status during the follow-up period, unique IDs would afford complete matches with deceased persons (although a risk of misreporting would remain). However, according to the October 2015 census, 10.2% of Japan residents had moved to another municipality since October 2010. 19 This rate was highest among people aged 30-34 years (31.5%) and lowest among those aged 70-74 (2.9%; eTable 9). 19 According to 2010 National Vital Statistics, 140,428 people (700,214 couples) married and 502,756 people (251,378 couples) divorced, representing approximately 1.5% of the Japanese population. 20 Although it is difficult to determine the exact number of people widowed per year, this suggests that at least 1.5-2.0% of the Japanese population changed marital status in 2010, indicating that approximately 10% of the population changed their marital status between 2010 and 2015. Allowing This mismatch may have caused an overestimation of mortality in our database. Changing ID causes both overestimation (because people who did not have a unique ID at the census developed a unique ID when they changed marital status or municipality during follow-up) and underestimation (because people who had a unique ID at the census lost this when they changed marital status or municipality during follow-up); however, overestimation is likely to have been more prominent due to the much larger number of individuals with duplicated IDs at the census. As shown in Table 1 and Table 2, we confirmed overestimations of mortality among young males (eg, overascertainment of deaths for men aged 35-39 years; note that the tendency was reversed among women aged 30-39 years in wave 2); therefore, to avoid inaccuracies in the estimations, we excluded all persons aged 30-39 years from the estimated mortality by socioeconomic status. However, systematic overestimation may still cause underestimation of mortality inequalities, especially in relative terms.
Second, we covered all prefectures in Japan, but individuals living in highly populous municipalities (ie, municipalities in Tokyo, Kanagawa, and Osaka) were underestimated, even though we applied a large weight to those sample populations. For example, no men aged 50-54 living in Hyogo Prefecture (a large prefecture) who had high education levels or were lower nonmanual workers (n = 25,235 in the 2010 census) were included in the sample population. Despite aiming for a nationally representative sample, these missing data may distort the sample population and the generalizability of the mortality estimates. Thus, this mortality database may include under-representation for individuals living in the capital region and prefectures with large populations. We confirmed that the results (the mortality of complete registry and sample weighted mortality) were correlated; however, variations were observed by prefecture (eTable 10 and eTable 11). It is difficult to determine whether we underestimated or overestimated socioeconomic inequalities in mortality due to this bias because there is no evidence about differences in health inequalities between urban cities and rural areas in Japan.
Third, across all age categories highly educated persons had a high probability of changing their address (eTable 12) 19 ; this would cause loss of unique IDs and under-ascertainment of deaths during the follow-up period. Therefore, mortality of highly educated men and women may be underestimated, resulting in overestimation of mortality inequalities. Thus, mortality inequalities should be interpreted with caution.
Fourth, the follow-up period also needs to be discussed. While a shorter follow-up period would bring a more complete linkage and possibly less bias, given the large proportion of the population who change IDs over time. Initially, we had tried shorter follow-up periods (1 year or 3 years after the census) before we performed this study with a follow-up period of 5 years. The results of shorter follow-up periods (ie, 1 or 3 years) showed weighted mortality rates were more overestimated than those of the complete registry. As we discussed above, changing ID causes both overestimation and underestimation and we concluded overestimation was more likely to be prominent for shorter follow-up periods in the current data.

Conclusions
As a result of systematic over-ascertainment of deaths for the certain causes of death and demographic factors, our deterministic-linkage-based estimates between the Japanese population census and mortality records should be interpreted with caution. In particular, mortality inequalities may be biased by both mechanisms causing overestimation and underestimation. However, the developed census-linked longitudinal mortality data nevertheless produce new estimates for assessing mortality inequalities by socioeconomic status for the Japanese population. In addition, our estimates can be triangulated with estimates obtained with other methods. Further study is necessary to develop better national longitudinal mortality databases and provide benchmarks for monitoring and reducing socioeconomic inequalities in health.