A Method for the Estimation of Completeness of Cancer Registration Application to the Fukuoka Cancer Registry

A method is proposed for estimating the number of incident cases of cancer and complete-ness of cancer registration using data from a population-based cancer registry. It is based on two assumptions : 1) submission of supplementary report does not depend on the duration of ill state ; and 2) registration reports and death certificates are independent sources of informa-tion. The method proposed in the present study has two advantages : 1) the number of incident cases can be estimated, even when completeness of registration varies over a period of time; 2) over-estimation of incidence by a registry can be evaluated. The use of this method was illustrated by data from the Fukuoka Cancer Registry, and completeness of registration was estimated as 69.4% for males and 66.2% for females. Furthermore, validity of the two assumptions was evaluated, and result from the evaluation suggested that for most sites the assumptions were valid and the estimated completeness not biased. But for some sites, such as liver, lung, and pancreas, the second assumption would be less valid and the complete-ness underestimated.

A method is proposed for estimating the number of incident cases of cancer and completeness of cancer registration using data from a population-based cancer registry. It is based on two assumptions : 1) submission of supplementary report does not depend on the duration of ill state ; and 2) registration reports and death certificates are independent sources of information. The method proposed in the present study has two advantages : 1) the number of incident cases can be estimated, even when completeness of registration varies over a period of time; 2) over-estimation of incidence by a registry can be evaluated. The use of this method was illustrated by data from the Fukuoka Cancer Registry, and completeness of registration was estimated as 69.4% for males and 66.2% for females.
Furthermore, validity of the two assumptions was evaluated, and result from the evaluation suggested that for most sites the assumptions were valid and the estimated completeness not biased. But for some sites, such as liver, lung, and pancreas, the second assumption would be less valid and the completeness underestimated. This method is excellent in its mathematics and may most usefully be applied to registry data which consist of three or more main sources of information.
But, because Robles' method regards "death certificate only" (DCO) cases as incident cases for the year they died, it cannot assess data from a registry in which completeness of registration considerably varies over a period of time. Moreover, his method also cannot assess over-estimation of incidence by a registry.
In the present study, a simple method, which can be applied even in the condition stated above, is proposed to estimate the number of cancer incident cases and completeness of registration, as stated in the preliminary report12).
The use of this method for assessing registry data is illustrated by its application to the Fukuoka Cancer Registry. Furthermore, validity of assumptions used in the estimation is evaluated using available data, and possible bias on the estimates is discussed.

MATERIALS AND METHODS
The Fukuoka Cancer Registry The Fukuoka Cancer Registry, established in 1984, is a population-based cancer registry covering Fukuoka Prefecture, which has a population of about 5 million. Like other registries in Japan, this registry relies on two primary sources of information : 1) registration reports, which are submitted voluntarily to the registry by the clinician or by the registrant in the hospital ; and 2) death certificates, on which there is any statement regarding cancer. In addition, the registry also collects a secondary source of information : supplementary reports on diagnosis of cancer, which are submitted on request by the registry for deceased cases without a registration report. Identification of cases is achieved through computer-assisted search of these data. When more than one independent primary cancer are found in the same individual, each is counted separately for incidence computation.
Taking into account the delays in submission of reports and time for registration work, incidence computed for a certain year can only be reported by the registry after 3 or 4 years. The reported number is equal to the number of cases with a registration report or a supplementary report plus the number of cancer cases identified only through a death certificate (DCO cases).

Estimation of completeness of registration
The target to be estimated is all cancer cases diagnosed in the year 1984 among Fukuoka residents. The following files and information were obtained from the registry for the estimation : 1) file of registration reports and that of supplementary reports on cases for which the date of diagnosis was during 1984; 2) file of death certificates with any mention of cancer for 1984 and 1985 ; and 3) the response rate to the request by the registry for submission of supplementary reports for deceased cases without a registration report (55.4% for all sites).
As is shown in Figure, all cases diagnosed in 1984, including those not registered, were classified according to whether they were present or absent in each of the data sources. Among them, A and B were ascertained by registration reports that the year of diagnosis was 1984, and C 1 by supplementary reports. But the number of cases for C2 and that for D could not be obtained directly through existing data. Therefore, estimation of these numbers are needed to obtain the total number of incident cases. Based on the second assumption, the number of cases of D was estimated as follows, D=B*(Cl+C2)/A and hence the total number of cancer cases were estimated.
Completeness of registration was calculated as the ratio of the reported number of cases to the number estimated by this method. These values were calculated by sex and 21 anatomical sites. Estimated number of cases for total sites is the sum of those for each site.
For the assessment of validity of the assumptions used in the analysis, distribution of length of time between onset of cancer and death was compared among three groups divided by their status of registration : group of cases with a registration report (X), with a supplementary report (Y), and without any reports (Z). Information on length of the time was obtained from death certificates of 1985 with any mention of cancer ; cases without such information were excluded from the analysis. For sites with more than one hundred deaths in that year, the Wilcoxon test was performed to test the difference in the distribution among the three groups. If there is no significant difference between group Y and group Z, we can conclude that the first assumption is valid and the estimated number for C2 in figure not biased. Also, if there is no significant difference between group X and other groups, we can conclude that the second assumption may be valid and the estimates obtained not biased.

RESULTS
The results are shown in Table 1, which includes reported and estimated number of incident cases (Ie) , percent completeness of registration, and ratio of the estimated number to the reported number of cancer deaths in 1984 (D) by site and sex. The estimated number of cases for all sites is 16,676 (9 ,101 for males and 7,575 for females). The discrepancy between the estimated and the reported number is comparatively greater in stomach, uterus, and breast cancers. The percent completeness is 69.4% for males and 66.2% for females, and the ratio of Ie to D (Ie/D) is 1.91 for males and 2.28 for females. The completeness of registration considerably varies by site. Higher completeness is found in such sites as the hematopoietic tissue, liver, pancreas, and lung, while it is lower in sites such as the bladder, rectum, and prostate. It should be noted that the percent completeness is over Death certificates over a two-year period Figure. Classification of hypothetical cases diagnosed in a defined period by three data sources.
100% for the hematopoietic tissue in both sexes and for the pancreas in females. Table 2 shows three types of distribution pattern for length of time between onset of cancer and death by group. Group X consists of cases with a registration report, group Y with a supplementary report, and group Z without any reports (DCO). In regard to the stomach, there were no significant differences among the three groups. The same results were obtained for the lymphatic tissue, hematopoietic tissue, uterus, breast, esophagus, and bladder.
On the other hand, significant differences were found between group X and the other groups but not between group Y and group Z at the sites of the liver, lung, and pancreas. Also, significant differences between group Y and group Z were seen at the sites of the colon and rectum. A slightly different pattern for the gall bladder and bile duct was noticed. Distribution of length of the time among cases of group X was significantly different from that in group Z, but not group Y.

DISCUSSION
There are two indicators of completeness of registra- a Ratio of the estimated number to the reported number of cancer deaths during the same year The values in the table represent the percentage of cases and the cummulative percentage in parentheses a Information was obtained from death certificates b Group of cases : X : with a registration report, Y : with a supplementary report, Z : without any reports (DCO cases) c Wilcoxon rank test tion commonly used in cancer registration : 1) the proportion of cases registered only through death certificate with a statement on cancer, or "death certificate only" (DCO) ; and 2) the ratio of the number of deaths attributed to cancer to the number of registered cases, or "the mortality/incidence ratio" (M/ I)13). But these indicators do not show how many cases of cancer were not registered among residents in the area covered by a registry. In addition, DCO changes to a large extent by submission of a supplementary report for each deceased case without a registration report, and M/I varies with change of fatality rate. Therefore, these indicators are less comparable among registries, and also among different points in time within the same registry.
Robles et al. applied the capture-recapture method to estimate the number of cases and completeness of registration11). But the method could not be validly applied when completeness considerably varies over a period of time, because it implicitly assumed that the number of DCO cases are constant over time. DCO cases reflect completeness of registration both for the year they died and the previous years 13). Our method drew a distinction between cases with a clinical report and those without (DCO cases). Our method, therefore, could evaluate registry data with considerable variation in completeness. In addition, the capturerecapture application cannot evaluate over-estimation of incidence by a registry which could occur when completeness of registration rapidly increased over a period of time. Therefore, the completeness estimated by Robles' method is always equal to or less than 100%. But, as is shown in Table l, our method can evaluate the over-estimation, eg. the hematopoietic tissue, where the estimated completeness is above 100%. Over-estimation, however, due to over-diagnosis or duplicate registrations 13) could not be evaluated by our method.
The ratio of the number of cases to that of deceased cases for all sites was estimated to be about 2 by our method. This value is almost the same as that found in the Connecticut Cancer Registry (I/D : 1 .89 for males, 2.13 for females) 13), where registration was nearly complete (DCO was about 1%) . In Japan, although there is no such complete registry , Kato et al. estimated the I/D for the years from 1979 to 1984 to be 1.86 by the regression method10). Furthermore, the results that Ie/D for all sites was higher in females than in males is reasonable because cancer specific to females, such as cancer of breast and uterus , has a comparatively good prognosis on the whole .
The estimates of completeness were inversely related to ratios of the estimated number of cases to deaths (Ie/ D), that is, the estimates increased as Ie/D decreased . Spearman rank correlation coefficient between these estimates and Ie/D for the 21 site-groups is -0 .81 for males and -0.89 for females. These finidings were found in previous reports4,11), and also predicted from the registration system of the Fukuoka Cancer Registry. Even if submission of registration reports was equally incomplete for every site-group, site-groups with high fatality would be more completely registred through death certificates than those with low fatality4).
Validity of the two assumptions used are crucial for estimating incidence and completeness of registration. Examination of length of time between onset of cancer and death revealed that length of the time was almost identically distributed between cases with a supplementary report and those without (DCO cases) for most sites. This result supports the first assumption that submission of supplementary report was irrespective of the duration of ill state. But for the colon, length of the time among cases with a supplementary report was significantly longer than those without, and thus the number of C2 (in Figure) would be overestimated. The opposite result was obtained for the rectum (data not shown in the Table), thus the number underestimated.
The second assumption, that registration reports and death certificates were independent sources of information, can only be evaluated by the independent case ascertainment method. This method, however, was quite difficult to conduct, we compared lengh of time from onset of cancer to death between group X and other groups instead. Length of the time among cases with a registration report was significantly longer than that among those without for the sites of liver, lung, and pancreas, while it was almost equally distributed for other sites. This result indicates that, for the former sites, cases with a registration report might survive longer than those without, and thus there might exist an inverse relation between registration reports and death certificates. Therefore, the estimated number of cases would be overestimated, and the estimated compleseness underestimated for these sites. Possibility of such bias on estimates was suggested from the Osaka Cancer Registry9). But for the latter sites, the assumption might be valid and the estimates not biased.
The level of reporting to a cancer registry varies by hospitals8). Variation in completeness of registration and that of the prognosis of patients among hospitals may combine to produce such a dependent relation as a whole. One possible way to avoid this problem is to divide hospitals into groups according to their type and/or size and then to estimate completeness for each group. This will increase the validity of the estimates.
The one limitation of our method is that estimates based on small number of deceased cases suffer from random error. This could happen in sites where inci-dences are too low and/or survival rates are too high to obtain a sufficient number of deceased cases for the estimation. Thus, estimates for such sites as the thyroid and bone are less reliable. This type of limitation is common to all methods using the death certificate for the estimation11).
In this paper, we introduced a simple method for the estimation of completeness of cancer registration. Although the application of this method is restricted to registry data which contain supplementary reports for cases first known through a death certificate, such reports are routinely collected in most of the registries in Japan. For the assessment of completeness of registration, this method could usefully be applied to such registry data.