Reliability of a questionnaire used to survey allergic disease in school.

The reliability of a questionnaire which contained 21 items concerning asthma and allergies of the nose, skin, and eyes and other questions was evaluated by test-retest method. The questionnaire was the same as one used two years ago in a survey of allergic diseases found among elementary, junior and senior high school students in Shizuoka Prefecture, Japan. In order to evaluate its reliability, calculations were made of proportions of agreement, Cohen's kappa values and intraclass correlation coefficients. Both the proportions of agreement and the kappa values were fair, and all the intraclass correlation coefficients showed high values. The results suggest that there might be a slight effect of age, and that the articles which showed higher kappa values tended to be easy to answer, while the items in which kappa values were lower tended to consist of questions of multiple answers. This questionnaire can be regarded as useful for our original purpose of investigating allergic diseases. Among the kappa values of items, the values regarding diagnosis by doctor tended to be the largest. It was suggested that doctor's diagnosis was strongly convincing for patients. The reliability of this questionnaire survey can be regarded as satisfactory.

Recently, allergic diseases have drawn much attention in various areas1-3). In 1990, Usami and others conducted an extensive survey of allergic diseases found among students4). They gave a questionnaire to 49,419 elementary, junior and senior high school students in three regions (east, middle and west) of Shizuoka Prefecture, Japan. The questionnaire contained 21 items concerning asthma and allergies of the nose, skin, and eyes. It is essential to evaluate the reliability of this questionnaire, because the answers to each question involved the subjectivity of each student.
In order to evaluate the reliability, we assessed its reproducibility with the test-retest method. For the indices of reproducibility, we calculated proportions of agreement, kappa values and intraclass correlation coefficients regarding each item in the two surveys. This study was conducted to certify reproducibility as a means of judging reliability of the questionnaire used to survey the above allergic diseases 5.6).

MATERIALS AND METHODS
The subjects of the retest were students in nine schools selected from those who had responded to the first questionnaire in April 1990. The nine schools were chosen on the basis of their availability to take the test. The test was given in April 1992 (the same month as in 1990) to the sixth grade students of elementary school (the same students who had been in fourth grade in the 1990 research), and the third grades of junior high school and senior high school (corresponding to the first grades of junior and senior high school in 1990). The total number of the students who responded was 2,669.
Linking the data of the first 49,419 students and the second 2,669 by their names and dates of birth, 2,291 students were chosen as subjects for the retest survey. Out of the data from those 2,291 students, reproducibility was studied with regard to their answers concerning asthma, allergy of nose, skin, and eyes and various other questions such as nourishment in their infancy and the history of any allergic diseases in their parents or siblings. Newly acquired diseases after the first test, data which was not valid (unknown age et. al) and answers in which meanings were not clear that is, answered in more than two categories for asking single ones, were omitted from the analysis.
To analyze the data, we summed up the frequency of answers which showed agreement between the two testings in each category and calculated their proportion to the total number of answers (proportion of agreement). Because proportion of agreement could have been accidental, we also used kappa statistics 7.of Cohen which give a scale considering accidental agreement. The reproducibility of categorical data was evaluated using the proportion of agreement and Cohen's kappa statistics. To evaluate the reproducibility of continuous data (only age), we calculated intraclass correlation coefficients9).
In order to evaluate the difference of the proportion of agreement and kappa statistics and intraclass correlation coefficients by sex (male, female) and age (elementary, junior and senior high school), we tested the difference of proportion and correlation coefficient statistically.

RESULTS
Sex and age were distributed as shown in Table 1. The ratio of males to females was about one to one. The number of high school students was over twice as many as that of elementary school students, and the number of junior high school students was intermediate. Tables 2-10 show the numbers of the subjects (N), the proportions of agreement and the kappa values for each of the allergic diseases and other items. Tables 2-3 show these values for items of nasal allergy, tables 4-5 for asthma, tables 6-7 for cutaneous allergy, tables 8-9 for ophthalmic allergy and table 10 for other items. Tables 3,5,7,9,10 show these values for the allergic diseases and other items by sex and age (elementary, junior and senior high school). Tables 3,5,7,9,10 are the result of statistical tests.
Regarding the kappa values and the proportions of agreement, the decimal fractions were rounded off to four decimal places.
In all cases (See tables 2, 4, 6, 8, 10), there were a total of 21 (13) categorical items studied. There were no items in the area of the kappa value 0.00-0.20, 3(3) items in the area of 0.21-0.30, 6(4) items in the area of 0.31-0.40, 3(2) items in the area of 0.41-0.50, 6(4) items in the area of 0.51-0.60, 2(0) items in the area of 0.61-0.70, no items in the area of 0.71-0.80, 1(0) item in the area of 0.81-0.90 and no items in the area of 0.91-1.00. The numbers in parentheses show the number of items which allowed multiple answers.
As for proportion of agreement, there were no items in the area of the value below 0.30, 2 (2)  Intraclass correlation coefficients were all more than 0.9 (These results are not shown in tables).
The difference of the proportion of agreement and kappa values and intraclass correlation coefficients by sex was not significant statistically, so these results were not shown in tables 3,5,7,9,10. In these tables the results of statistical test on the difference by age (A.elementary, B junior and C.senior high school) are shown. The difference of the proportion of agreement between elementary school (A) and high school (C) on item 1-1) was significant statistically (P ; probability<0.01). Between junior high school (B) and high school (C) on item 1-1), it was significant (P<0.05). The difference of the kappa values between elementary school and high school on item 1-1) was significant (P<0.05). The difference between junior high school and high school on item 1-1) was significant (P < 0.01). These results are shown in table 3. The difference of the proportion of agreement between elementary school and high school on item 4-1) was significant (P<0.05). Between junior high school and high school on item 4-1), it was significant (P<0.01). The difference of the kappa values between elementary school and high school on item 4-1) was significant (P<0.01). Between junior high school and high school on item 4-1), it was significant (P<0.01). These results are shown in  The difference of the proportion of agreement between elementary school and junior high school on item 5-1) was significant (P<0.05). Between elementary school and high school on item 5-1), it was significant (P<0.01). Between junior high school and high school on item 5-1), it was significant (P<0.01). The difference of the kappa values between elementary school and junior high school on item 5-1) was significant (P<0.05). Between elementary school and high school on item 5-1), it was significant (P<0.01). Between junior high school and high school on item 5-1), it was significant (P<0.01). The difference of the proportion of agreement between elementary school and high school on item 5-2) was significant (P<0.01). Between junior high school and high school on item 5-2), it was significant (P<0.05). These results are shown in table 10.

DISCUSSION
The validity and reproducibility of a questionnaire seriously affect the reliability of the epidemiological study on which it is based10,11). The validity and reproducibility of questionnaires have been examined in various ways. Reproducibility has usually been investigated based on the frequency and quantity of food intake as determined by questionnaires12). In our study, Table 3. The number of subjects, proportion of agreement, and kappa value for Item I regarding nasal allergy in two questionnaires by sex and age (elementary, junior and high school ).
The contents of item are them in table 2. Pa:Proportion of agreement *: P <0 .05 **: P<0.01 we investigated the reproducibility of a questionnaire on allergic diseases by means of the proportion of agreement, kappa value and intraclass correlation coefficient. In this study, there was no effect by sex. It was expected that the higher age students would more exactly understand the meaning of every item in the questionnaire than the lower age students. But, in this study, results contrary to the expectation were shown. Namely, the proportion of agreement and kappa values on items 1-1), 4-1), 5-1) in high school students showed less statistical significance than those for elementary and jumiornior high school students. The reason was thought to be that the higher age students replied more irresponsibly to the questions on items 1-1), 4-1), 5-1) than the lower age students. They might have felt it troublesome to reply to the same questions in the second survey as those in the first survey. But the proportion of agreement on item 5-2) in high school students showed greater with statistical significance than those for elementary and junior high school students. In considering the meaning of this item 5-2), it was thought that the reason for these results might be that the higher age students more exactly understood the meaning of that item than the lower age students. For these reasons, the effect of age may be a good one for measuring reliability in the questionnaire.
The articles which showed higher kappa values tended to be easy questions to answer (i.e. Yes or No), while the items in which kappa values were lower tended to consist of questions with multiple answers13). The reason why item 1) in each question showed large kappa values was thought to be that item 1) got many answers of "No"(I have no allergic symptom.). Because item is easier to answer, items following item 2) in each question received fewer answers (subjects) than item 1). In the items with higher kappa values, item 1) (concerning presence of symptoms of each allergy) and item 2) (concerning the diagnosed name of diseases) in each question appeared frequently.
Kappa value will be 0 when the proportion of agreement is a Table 5. The number of subjects, proportion of agreement, and kappa value for Item 2 regarding asthma in two questionnaires by sex and age elementary. iunior and high school ).
The contents of item are them in table 4.
Pa : Proportion of agreement Table 6. The number of subjects, proportion of agreement, and kappa value for Item 3 regarding cutaneous allergy in two questionnaires mere accident (p=Pc ; See APPENDIX) and it will be closer to 1.0 as the proportion of agreement value is higher (p>Pc ; See APPENDIX). Conversely the value will be negative when the proportion of agreement is below the expected proportion of agreement (p<Pc ; See APPENDIX), and its value will be equal to 1.0 in the case of complete agreement (p=1 ; See APPENDIX)14.15).For interpretation of the kappa value, the following guidelines have been used ; below 0.20 indicates slight agreement, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 substantial and 0.81-1.00 almost perfect 16,17) All questions showed relatively high reproducibility in spite of intervals of two years between the first and the second test-ings. It is concluded that the answers for this questionnaire are reliable and that the way of data collection including the questionnaire is useful to investigating allergic diseases18-20).
In all cases (see tables 2,4,6,8,10), considering the contents of the questionnaire, the items regarding the diagnosis by doctor in every question showed moderate reproducibility as shown by the fact that all kappa values for these items (items 2) in each question) were in the area of 0.41-0.60. The items in which each student judged the symptoms of diseases showed fairly low reproducibility as shown by the fact that all kappa values on their items (these items are 1-3) and 3-3)-1) were in the area of 0.21-0.30. The items on the season in Table 7. The number of subjects, proportion of agreement, and kappa value for Item 3 regarding cutaneous allergy in two questionnaires by sex and age ( elementary, junior and high school ).
The contents of item are them in table 6. which allergic diseases appear showed fair reproducibility in every question as shown by the fact that all kappa values of these items (items 3) or 4) in each question) were in the area of 0.31-0.40. The items on the age at which allergic diseases appear in every question showed fair and moderate reproducibility as shown by the fact that all kappa values of these items (items 4) or 5) in each question) were in the area of 0.31-0.60. Among the kappa values of items 2), 3), 4) and 5), the value of item 2) ( items of diagnosis by doctor ) was the largest. This may mean that doctor's diagnosis is very convincing to patients in these diseases21>. Establishing of reproducibility-which was here examined by proportion of agreement, kappa value and intraclass correlation coefficient is very important in evaluating the reliability of surveys by questionnaire. However, when individual reproducibility is not high, it cannot be clarified by a recurring survey like this one, whether there were any problems in the questionnaire itself, whether there were systematic measurement errors of the methodology wherein we asked the subjects to write the answers by themselves, or whether their disease had actually changed for some certain reason.
Larger kappa values were accompanied by larger proportions of agreement and smaller kappa values showed smaller proportions of agreement. When an item which allows multiple answers is used, its categories should be carefully devised. Furthermore, in order to get a larger proportion of agreement of answers, and to lessen the frequency of misclassification it could be helpful to unify the categories.
Finally, it was thought that the reliability of this questionnaire survey is regarded as satisfactory. Table 9. The number of subjects, proportion of agreement, and kappa value for Item 4 regarding ophthalmic allergy in two questionnaires by sex and age ( elementary, junior and high school ).
The contents of item are them in table 8. Pa:Proportion of agreement *: P<0 .05 **: P<0.01 This k is called the kappa statistic or the kappa value ( coefficient ), and p is called the proportion of agreement. (Pc is called the expected proportion of agreement by accident. )