Reproducibility of self-administered questionnaire in epidemiological surveys.

We evaluated the reproducibility of data on intake frequency of 33 food items, beverage intake frequency of 3 items, drinking and smoking habits, and past history of 10 diseases obtained from a self-administered questionnaire. The survey subjects consisted of 263 aged 39 to 79 years individuals in the general population. After about one year interval these subjects consecutively participated in two surveys and were unintentionally requested to answer to the same questionnaire. The means of percent exact agreements for 33 food items were 48.4% in all, 42.7% in males and 49.4% in females. The means of their percent agreements within one category difference were 85.4% in all, 83.3% in males and 85.8% in females. The reproducibility of beverage intake frequency was the highest in green tea, followed in order by black tea and coffee. The percent exact agreements were 85.4% in males and 81.8% in females for drinking habit, 87.5% in males and 99.0% in females for smoking habit, and 93.7% in males and 97.2% in females for past history. The reproducibility of the self-administered questionnaire was the highest in past history, followed in order by smoking habit, drinking habit and dietary habit. The values of reproducibility were higher in females than in males except for drinking habit. Although reduction of categories was needed to improve the reproducibility, the obtained values from the self-administered questionnaire were sufficiently high for epidemiological studies.


Reproducibility
of Self-Administered Questionnaire in Epidemiological Surveys Nakako Fujiwara and Shinkan Tokudome We evaluated the reproducibility of data on intake frequency of 33 food items, beverage intake frequency of 3 items, drinking and smoking habits, and past history of 10 diseases obtained from a self-administered questionnaire.
The survey subjects consisted of 263 aged 39 to 79 years individuals in the general population. After about one year interval these subjects consecutively participated in two surveys and were unintentionally requested to answer to the same questionnaire. The means of percent exact agreements for 33 food items were 48.4% in all, 42.7% in males and 49.4% in females. The means of their percent agreements within one category difference were 85.4% in all, 83.3% in males and 85.8% in females. The reproducibility of beverage intake frequency was the highest in green tea, followed in order by black tea and coffee. The percent exact agreements were 85.4 % in males and 81.8 % in females for drinking habit, 87.5 % in males and 99.0% in females for smoking habit, and 93.7% in males and 97.2% in females for past history.The reproducibility of the self-administered questionnaire was the highest in past history, followed in order by smoking habit, drinking habit and dietary habit. The values of reproducibility were higher in females than in males except for drinking habit. Although reduction of categories was needed to improve the reproducibility, the obtained values from the self-administered questionnaire were sufficiently high for epidemiological studies. J Epidemiol, 1997; 7:61-69. self-administered questionnaire, reproducibility, epidemiological survey, percent agreement, kappa statistic From the epidemiological standpoint, various lifestyle surveys have been carried out to clarify the association with diseases. Evaluation of the precision and validity of obtained data is essential v ; however, such studies have not fully been performed due to limitations of time and cost. Reproducibility, or repeatability is essentially the degree of agreement between repeated measurements.
Validity is the range to which a method of measurement provides a true assessment. There are many studies on the reproducibility and/or validity of data in epidemiological surveys 2-11) ; whereas a little consideration has been given to the reproducibility of the epidemiological data obtained from the cohort study 5,610). The aim of this paper is to study the reproducibility of the self-administered questionnaire for the cohort study which we are carrying out. In this paper, we especially evaluated the reproducibility of the epidemiological data by sex, assuming that there is no change in life style within one year.

Study population
The subjects of whole survey consisted of 5,279 individuals in the general population cohort from 3 cities and 7 towns in Saga Prefecture in Japan. Of them, 263 individuals (41 males and 222 females) from one city and three towns, who accidentally participated in the same epidemiological survey after about one year interval, were analyzed.
The age of males ranged from 40 to 79 years of age (mean ± standard deviation, 54.8 ± 9.8), and that of females 39 to 73 years of age (53.6 ± 6.8).

Methods of survey
Epidemiological survey, which includes questionnaire study, was carried out twice according to the same format and procedure in 1989 and 1990.
A questionnaire consists of self-administered questions on intake frequency of 33 food items, beverage intake frequency of 3 items, drinking and smoking habits, and past history of 10 diseases.
The choice for each question on food and beverage intake frequency included five categories: consumes (1) rarely, (2) 1 -2 times per month , (3) 1 -2 times per week, (4) 3 -4 times per week, and (5) nearly daily. The choice on habits of drinking and smoking included three categories: (1) drinks/smokes, (2) stopped drinking/smoking, (3) does not drink/ never smoke. For the past history of diseases the choice included two categories: (1) no and (2) yes.

Statistical analyses
For assessing the reproducibility of categorical items, the following parameters were used: (1) percent agreement (exact and one category difference) , (2) kappa statistic, (3) weighted kappa statistic and (4) Spearman's rank correlation coefficient.
The percent exact agreement is the percentage of the subjects who chose the identical category in the first and second surveys to all subjects who replied to the questionnaire in both surveys (number of effective repliers) 12). The percent agree- Table 1. Reproducibility of data from a food intake questionnaire in males.
* p<0 .05, ** p<0.01, *** p<0.001 #1: Answers to the items in both surveys. ment within one category difference is the percentage of total subjects who chose the same or one category difference in both surveys.
The kappa statistic is a parameter which measures the degree surpassing the agreement expected by coincidence alone [13][14][15][16] In addition, the weighted kappa statistic was calculated using the square of the distance between categories as weight 17 . To evaluate the values of kappa and weighted kappa statistics, we adopted the same criteria recommended by Landis and Koch 4, which are as follows: kappa values greater than 0.75 or so may be taken to represent "excellent" agreement, values between 0.40 and 0.75 may be taken to represent "fair to good " agreement, values below 0.40 or so may be taken to represent "poor" agreement . These kappa values were statistically tested by comparing the values z (=kappa value divided by its standard error) with the limits of the normal distribution 14) Spearman's rank correlation coefficient was calculated after processing the same rank to correct for the number of observations that are tied for the same category 18) ,

RESULTS
The means of exact percent agreements for the 33 food items were 48.4% in all, 42.7% in males (Table 1) and 49.4% in females ( Table 2). The number of food items showing a percent exact agreement of 50% or more were 13 (39.4%) in all, The food items giving a percent exact agreement of 30% or less were ham, Chinese cabbage, orange, and juice in males but only juice in females. For those food items that showed 30% or less of percent exact agreement, the reproducibility of these food items was improved from 57.9% to 78.0%, when used the percent agreement within one category difference.
The calculated values of kappa and weighted kappa according to the food item are also summarized in Tables 1 and 2. Beef, pork, chicken, liver, egg, milk, yogurt, cheese, tomato, pickles, tofu, and confectionery showed weighted kappa values exceeding 0.4 in males and females, which suggested fair to good agreement. In contrast, the weighted kappa values were less than 0.4 for fried vegetables, green leaf vegetables, cabbage or lettuce, edible wild plants, potato, boiled beans, orange, and juice in both sexes, suggesting poor agreement. For the reproducibility assessments using the Spearman's rank correlation coefficient, almost the similar evaluations when using the percent agreements or kappa statistic, were also shown in Tables 1 and 2.
The reproducibility of three beverage intake frequency was the highest in green tea (percent exact agreements were 84.4% in all, 80.5% in males and 85.2% in females), followed in order by black tea and coffee (Table 3).
Because gross misclassification is very serious issue in epidemiological surveys, we examined the opposite reply between the two surveys. Table 4 shows the number of the subjects who chose an extreme category [(1) eat rarely or (5) nearly daily] on 33 food items and 3 beverages in the first survey, along with the number of the subjects who selected the exact oppo-site category in the second survey. The means for these 36 items revealed that about 40% among effective answers in both sexes chose category 1 or 5 in the first survey and very few among effective answers, 0.82% in all, 1.11% for males and 0.77% for females, chose the exact opposite category in the second survey. The items for which the rate of the exact opposite answer exceeded 3% among effective answers, were juice in females, coffee in males, and green tea in both sexes. Figure 1 shows the relationship between percent exact agreement (y-axis) and estimated mean food intake per week (xaxis) for each items. The percent exact agreements for the food consumed rarely or frequently were higher in both sexes than for the moderate food intake. The r-square values for the secondary regression equation having food intake as an independent variable and percent exact agreement as a dependent variable, were more than 0.5 in both sexes. Table 5 shows the percent exact agreement and distribution of reply to drinking and smoking habits. The percent agreements for drinking habit were 82.4% in all, 85.4% in males and 81.8% in females. Although the percent exact agreement showed more than 80%, among 159 females who answered "does not drink" in the first survey 16 females (10 .1%) answered "drinks" in the second survey.
The percent agreements for the smoking habit were 97.2% in all, 87.5% in males and 99.0% in females.
The percent exact agreements for past history of diseases were more than 90% except for gallbladder operation and appendectomy in males (Table 6).

DISCUSSION
Factors affecting reproducibility of the food frequency questionnaire should include survey season, survey interval, educational effect, food intake frequency, sex and so on. As to the interval and time of surveys carried out, one must recall that, better reproducibility is attainable if the interval between the two surveys is short 11,20), whereas reproducibility tends to be poorer when the seasonal variation would cause a real change in life style, especially in the intake frequency of food items. We conducted surveys from July to September in 1989 and 1990. The mean intervals of the examination was approximately one year: 354 days (265 to 417 days) for males and 357 days (259 to 417 days) for females. Thus, we can ignore factors such as subject's previous reply recall or seasonal variation that might affect reproducibility. Also, since subjects were queried on their average intake frequency of usual amount over the past one year, their physical conditions on the response day could not affect reproducibility. We evaluated the reproducibility in epidemiological survey using self-administered questionnaire in the two surveys, assuming that they didn't change their lifestyle within one year. Although we don't have any chances of special intervention or education, if any, to the participants in the first survey, to clarify this assumption, we compared the average food intake of each of the 33 food and 3 beverage items in the first and second surveys. The items in the two surveys found to be significantly different by the Mann-Whitney U test were as follows: the items more frequent in the second survey were Chinese cabbage (female, p<0.001), edible wild plants (females, p<0.05) and juice (females, p<0.01), whereas tofu (males, p<0.05) was the only item less frequent in the second survey. These food intake differences in the two surveys seemed not affected by education.
Ozasa found good reproducibility for food items taken habitually such as boiled rice, bread and milk with Spearman's correlation coefficients of higher than 0.610 10). High reproducibility of data on the intake frequency has been reported for food items that are frequently or rarely consumed 4). In our data, the relationships between the frequency of food intake and percent agreement are shown in Figure 1. Food items with more or less frequent intake showed better agreements.
Food items showing a percent exact agreement of 50% or more were 8 among 33 food items in males and 17 in females. Food frequency reproducibility was generally lower in males than in females. This is understandable because females primarily cook. Although the number of males was small, the results obtained here may reflect the sex difference because the age distribution for males was almost the same as for females. for 5). In our data, in which we need not consider the seasonal variation factor, the values for Chinese cabbage were 29.3% in males and 36.5% in females; for tomatoes, 39.0% in males and 51.6% in females, and for oranges, 22.5% in males and 39.2% in females, showing higher agreements than their data.
Ozasa et al. reported good agreements for smoking (85% for males, 99% for females) and drinking (76% for males, 78% for females) 10 . Spitz et al. also showed good reproducibility values for smoking and drinking, which indicated more than 80% agreement 21). In our data also, the percent exact agreements of drinking and smoking habits showed more than 80% in both sexes. However, the reproducibility of the drinking habits in females was the lowest: that is, 16 among 159 females (10.1%) who answered "does not drink" in the first survey oppositely responded "drinks" in the second survey; while 22 among 55 females (40.0%) who answered "drinks" in the first survey responded "does not drink" in the second survey. Although Colditz  slightly reduced among heavy drinkers22) , our data may be explained by that females who occasionally drink only a little in a month or so would inconsistently answer "drinks" or "does not drink" according to the chance. Fukao et al. reported that in the questionnaire consisting of five categories, 1.56% answered the exact opposite category 5) . In our data, 1.11% for males and 0.77% for females answered the exact opposite category (Table 4). These figures show that in the food frequency questionnaire consisting of five chosen categories, the possibility of obtaining the exact opposite answer was around 1%. For green tea, more than 4% in both sexes chose the exact opposite answers, whereas the percent exact agreements were good. It is important to note that Fukao's data imply the same results 5) . Ozasa et al. reported that there was a large Spearman's correlation coefficient and more than two category-difference answers in dairy food 10) With items in connection with which the opposite answers were often given, careful consideration must be taken as to conclusions. However, one can scarcely misconstrue the conclusions obtained from the epidemiological data, where the population consists of large subjects.
Although the means of exact agreements of food intake, where the choices were five categories, were 42.7% in males and 49.4% in females, the percent agreements within one category difference were improved to 83.3% in males and 85.8% in females. These results suggest the necessity of reducing categories to three to improve the reproducibility of the data obtained. The less selected category showed more agreement, of course, but information was missing. Deciding the number of categories for analysis is extremely difficult. One needs to discuss how to round out the categories while considering the frequency distribution of each item. We suggest that the repro-ducibility of self-administered questionnaire used here was sufficiently acceptable for epidemiological cohort studies.