Effects of Misclassification and Temporal Change of Response in Food Frequency on Risk Ratio

Misclassification and temporal changes of food consumption frequencies were estimated under the statistical models with some assumptions, and their effects on risk ratios were evaluated. Food frequencies of 27 items in 214 subjects were doubly measured by a questionnaire with 2 weeks interval, and those in 326 subjects were measured in 1989, 1993 and 1994. Median of probabilities of misclassification in responses among 27 food items was estimated to be 0.12. Medians of proportions of persons whose responses in 1989 were different from those after 5 years, were calculated to be 35% with misclassification and 27% without misclassification. For the true risk ratio of 3, medians of the risk ratios of dietary habits during 5 years, based on food frequencies measured at 1989, were observed to be 2.2 in case of responses with misclassification, and 1.7 in case of responses with misclassification and temporal changes. These suggested that the risk ratios of food frequencies would be seriously affected by misclassification and temporal changes in responses. J Epidemiol, 1997 ; 7 : 153-159.


Effects of Misclassification and Temporal Change of Response in Food Frequency on
Risk Ratio Shuji Hashimoto1, Satoshi Nakai1, Yoshitaka Tsubono2, Yoshikazu Nishino2, Akira Fukao3, and

Shigeru Hisamichi2
Misclassification and temporal changes of food consumption frequencies were estimated under the statistical models with some assumptions, and their effects on risk ratios were evaluated.
Food frequencies of 27 items in 214 subjects were doubly measured by a questionnaire with 2 weeks interval, and those in 326 subjects were measured in 1989, 1993 and 1994.Median of probabilities of misclassification in responses among 27 food items was estimated to be 0.12.Medians of proportions of persons whose responses in 1989 were different from those after 5 years, were calculated to be 35% with misclassification and 27% without misclassification.For the true risk ratio of 3, medians of the risk ratios of dietary habits during 5 years, based on food frequencies measured at 1989, were observed to be 2.2 in case of responses with misclassification, and 1.7 in case of responses with misclassification and temporal changes.These suggested that the risk ratios of food frequencies would be seriously affected by misclassification and temporal changes in responses.J Epidemiol, 1997 ; 7 : 153-159.In epidemiological studies, the association between dietary habits and chronic diseases, such as cancer, has been extensively investigated.Dietary habits are usually measured once by questionnaire, but the measurements have errors and the changes in habits over time [6][7][8][9].
In this study, misclassification and temporal changes of food consumption frequencies were separately estimated based on data and some statistical models, and their effects on risk ratios were evaluated under the models.

Data
Five surveys with a self-administered questionnaire including food consumption frequencies and drinking habits were undertaken for the same subjects in a rural town of Miyagi Prefecture in Japan.Details of these surveys were described elsewhere 9).Table 1 summarizes the time sequence of surveys and response rates.The first survey was conducted in the summer of 1988, and the others were conducted in winter between 1989-1994.Response rates were 74-99%.Subjects were asked the average consumption frequency of 27 items (shown in Table 2) during the previous year or the recent period.Each food item had five categories: 1) rarely, 2) one to two times per month, 3) one to two times per week, 4) three to four times per week, 5) almost everyday.
In order to estimate the magnitude of misclassification, con-Table 1.Sequence of surveys.
#: Numbers in parentheses were percentage over no. of eligible subjects.
The second survey was used as baseline survey in this study.
sistency of 27 food frequencies and the drinking habit between the third and fourth surveys (2 weeks interval between these 2 surveys) for 214 subjects (denoted as "misclassification-data") were analyzed.In order to evaluate temporal changes, the second, third and fifth surveys (4 years interval between the second and third surveys, 5 years interval between the second and fifth surveys) (denoted as "temporal-data") were used.The number of subjects who responded to all of these surveys was 326.
The statistical models with categories of three or more are complicated and need some assumptions.Therefore, in this study, five food frequency categories were combined into dichotomous categories, so that approximately half of the subjects in the second survey belong to each category.The categories with less and more frequent consumption were denoted as "negative" (or "-") and "positive" (or "+"), respectively, and the results of dichotomization are shown in Table 2.

Models of misclassification and temporal change of response
The model of misclassification of response reported by Harper 10) was used in our study, and details is described in Appendix 1.We let the proportion of persons whose true response is positive be p, and let the probabilities of misclassification of response be a among persons with truly positive answer and among truly negative persons ( and are falsenegative and false-positive rates, respectively).It was assumed that a = , since p, a and could not be simultaneously identified from the data measured twice10).Under this model, p, a and * were estimated using the maximum likelihood method based on the misclassification-data.The estimates of a and were denoted as "misclassification rate".
The model of temporal change of response was assumed to be a Markov chain model, and with details described in Appendix 2. We let the proportion of truly positive persons at baseline (the second survey) b , and let the probabilities of temporal change in true responses from one year to the next year be among truly positive persons (+ -) and be among truly negative persons (- +).Under this model, , and were estimated using the maximum likelihood method based on the temporal-data and the estimates of a and 9 .Good fits of the data to the model were tested using likelihood ratio statistics.Proportions of persons whose true responses at baseline were different from those 5 years later or in the fifth survey were calculated under the above model, and were denoted as "temporal change rates for 5 years".

Calculation of risk ratio
Consider the magnitude of a one year risk of an exposure.Suppose that the exposure is measured at the beginning and end of the one year interval.It was assumed that for the nonnegative value of A , the risk during the interval among persons whose true responses were positive at both measurement times was 1+ A times higher than among truly negative persons at the both times, and that the risk among truly positive persons at only one of the times was 1+ A /2 times higher.
The ratios of the 5 year risk from baseline among persons with positive responses at baseline to among persons with negative responses at baseline (referred to "observed risk ratio") were calculated using the estimates of , ; , , a and i under the above model, varying A from 0 to 2. Note that observed risk ratios were equal to true one (1+ ) without misclassification and temporal changes of responses, and that misclassification in responses were assumed to be nondifferential under the above model.Similarly, the risk ratios were calculated in case of responses without temporal changes ( = =0).

Sensitivity analysis on the assumption in the model of misclassification of response
In the model of misclassification, described above, it was assumed that a = .We observed the changes of temporal change rates and observed risk ratios in some cases that a .Suppose that = -a and that y varies under the condition rates and temporal change rates for 5 years.
Food frequency categories were as follows: 1) rarely, 2) one to two times per month, 3) one to two times per week, 4) three to four times per week, 5) almost everyday.#: Exclued missing values .
that 0.01 <= , <1 and that either of a and is less than 0.5.Conditioning that y were fixed in the range, and were estimated (see Appendix 1).Using the estimates of a and , s and were estimated by the same methods, and temporal change rates and observed risk ratios were calculated.

Misclassification and temporal change of response
Misclassification rates in 27 food items and drinking habits were shown in Table 2.The misclassification rates among 27 food items were 0.04-0.16(median=0.12),and were higher than in drinking habits (0.01).The smallest misclassification rates were observed in pork, tempura(fried foods) and wild plants, and the largest rates were in carrots, tomatoes and food boiled in soy.
Temporal change rates for 5 years in 27 food items and drinking habits were shown in Table 2. Temporal change rates in case of responses with misclassification were higher than those in case of responses without misclassification.The median of temporal change rates among food items was 35% with misclassification and 27% without misclassification.The smallest temporal change rates with misclassification were observed in pork, cheese and pickled vegetables, and the largest rates were in green leafy vegetables, tomatoes and food boiled in soy.The smallest temporal change rates without misclassification were observed in fried vegetables, carrots and pickled vegetables, and the largest rates were in green leafy vegetables, tomatoes and food boiled in soy.
In 9 food items, the data did not fit to the model (p<0.05).Among 18 other food items, medians of temporal change rates were 37% with misclassification and 27% without misclassification.

Effects of misclassification and temporal change of response on risk ratio
For true risk ratios of 1-3, medians and ranges of observed risk ratios among 27 food items, and observed risk ratios in drinking habits were shown in Figures 1 and 2. In case of response without temporal changes (Figure 1), observed risk ratios were lower than true risk ratios.For the true risk ratio of 3, observed risk ratios were 1.78-2.51(median=2.24)among 27 food items, and was 2.92 in drinking habits.The smallest observed risk ratios were in carrots, boiled beans and fruit juice, and the largest risk ratios were in pork, tempura(fried foods) and potatoes.
In case of response with temporal changes (Figure 2), observed risk ratios among 27 food items were 1.19-2.00(median 1.66) for the true risk ratio of 3, and were lower than in the cases above (Figure 1).The smallest observed risk ratios were in wild plants, other fruit and fruit juice, and the largest risk ratios were in eggs, fresh fish and potatoes.
Excluding the 9 food items in which the data did not fit the model on temporal change of response, observed risk ratios for true risk ratio of 3 were 1.78-2.46(median=2.20) in case of response without temporal changes and were 1.19-2.00(medi-an= 1.62) in case of response with temporal changes.

Sensitivity analysis on the assumption in the model of misclassification of response
We made two scenarios in this analysis.The first scenario that =0.01 corresponded to the maximum value of a and the minimum value of .Inversely, the second scenario that a =0.01 corresponded to the minimum value of a and the maximum value of .These scenarios were compared with the results with the assumption that = a which was assumed in the previous section.
Misclassification rates, temporal change rates and observed risk ratios were shown in Table 3.There were small differences of temporal change rates with misclassification and of those without misclassification among 2 scenarios and the assumption.For true risk ratio of 3, observed risk ratios without temporal changes were lower under the first scenario (median=1.94),and were higher under the second scenario (median=2.50)than under the assumption (median=2.24).For the true risk ratio of 3, there were small differences of observed risk ratios with temporal changes among 2 scenarios and the assumption (median= 1.49, 1.74 and 1.66).

DISCUSSION
In the model on misclassification, it was assumed that falsenegative rates (a) were equal to false-positive rates ( ).However, we observed that the results of temporal change rates and observed risk ratios were not extremely changed when =0.01 and a =0.01.The more crucial assumption underlying the model on temporal change was that given the current state (positive or negative), response in the next year was unaffected by the previous states.Although the models did not fit the data in 9 food items, the medians of temporal change rates and observed risk ratios among the other 18 food items were not different from those among all 27 food items.Thus, our results would seem to be not extremely changed according to the assumptions above.
Our results on misclassification rates and temporal change rates with misclassification were consistent with the results of reproducibility in the previous study 9) based on the same data although food frequency categories were not combined and Spearman s correlation coefficient was used in the analysis of Figure 2. Observed risk ratio in case of response with temporal changes.
Table 3. Misclassification rates, temporal change rates and observed risk ratios among 27 food items under the various assumptions of false-nagative and false-positive rates in the model on misclassification.
a : False-negative rate /3 : False-positive rate # : Observed risk ratios were for the true risk ratio of 3.
reproducibility.Pork, tempura(fried foods) and wild plants in which we observed lower misclassifications had higher reproducibility at 2 weeks (Spearman's correlation coefficients =0.63-0.71),and carrots, tomatoes and food boiled in soy which had higher misclassification rates had lower reproducibility (0.56-0.62).Pork, cheese and pickled vegetables in which we observed lower temporal changes in case of response with misclassification, had the higher reproducibility at 5 years (0.30-0.48), and green leafy vegetables, tomatoes and food boiled in soy which had higher temporal change rates had lower reproducibility (0.15-0.23).In our study, temporal changes excluding misclassification were estimated.Food items in which we observed lower and higher temporal changes without misclassification had lower and higher temporal changes with misclassification, respectively.Temporal change rates were lower without misclassification (medi-an=0.27)than with misclassification (median=0.35).This result suggests that temporal change of food frequencies without adjustment of misclassification were overestimated.It has been well-known that nondifferential misclassification caused the underestimates of risk ratios 1-5).We observed the same phenomenon in food frequencies.Observed risk ratios were estimated to be 1.78-2.51for the true risk ratio of 3 (Figure 1).This result indicates that the risk ratios of food frequencies would be seriously affected by misclassification of responses.We observed that temporal changes, as well as misclassification of responses, could cause the underestimates of risk ratio.In case of responses with misclassification and temporal changes, observed risk ratios during 5 years, based on the food frequencies measured at baseline, were estimated to be 1.19-2.00for the true risk ratio of 3 (Figure 2).We assumed that the risk was influenced by only current dietary habits, and that the risk instantly changed with the temporal changes of dietary habits.In general, the risk on chronic diseases would be affected by long-term dietary habits as well as short-term ones 7).The effects of temporal change of food frequencies on risk ratio under the assumption would be overestimated.Therefore, it suggested that the risk ratio of food frequencies would be observed between the case of response with only misclassification and with both misclassification and temporal change.
Since p, a and could not be simultaneously identified from data measured twice, some assumptions were necessary in this analysis.Suppose that y = a and that y varies under the condition that 0 < a , <1 and the either of a and is less than 0.5.Note that the condition that =0 is equivalent to that a = baseline and the next year.The probability of true responses of (+,+), (+,-), (-,+) and (-,-) are given by (1-), , (1-) ' and (1-)(1-), respectively.Under the above model on misclassification of response, the probabilities of the patterns of responses are determined using a and (described above).For example, the probability of responses of (+,+) is given by 7r (1- years.The responses have 8 patterns, such as (+,+,-).Similarly, probabilities of the patterns of true responses were represented using , , , a and .The numbers of persons with 8 patterns of responses follow a multinomial distribution in condition that the total number responded was fixed, assuming that temporal changes of responses were independent on each other and on misclassification of responses.Under this model, 7r , s` and were estimated using the maximum likelihood method based on the temporal-data and the estimates of a and .
unity1,2).4. The effects of misclassification on relative risk have been described in many studies 3-5).If exposure data especially in cohort studies are only measured at the outset although exposure history changes with the passage of time, the estimates of relative risk may be biased by inappropriate information of exposure.Therefore, we need to evaluate the effects of misclassification and exposure change to understand the results of cohort studies undoubtedly.

Figure 1 .
Figure 1.Observed risk ratio in case of response without temporal changes.

Table 2 .
Frequencies of response at baseline, misclassification