Choice of response scale for health measurement: comparison of 4, 5, and 7-point scales and visual analog scale.

To compare the feasibility of response scales with different numbers of steps for measuring health status, we assessed the ease of completing 4, 5, and 7-point scales and a visual analog scale. Four forms of the questionnaire, each of which consisted of the same ten questions about health status, but with a different type of scale among the four above mentioned, were given to 48 patients with a variety of conditions and diagnoses. The forms were attached together and the order in which they were presented was systematically varied considering permutations of forms. Respondents were asked to complete the forms following the order of the sheets. The 5-point scale form was most commonly chosen as the easiest to complete, and item omission was least frequent with this form. Similar high- reliability results in terms of Cronbach's alpha were achieved for each of the four forms. An additional survey among 153 patients confirmed the results of the first survey. The selection of the easiest scale to complete varied by sex (men preferred the 7-point scale), but was not affected by the conditions or diagnoses of the patients. The study suggests that the 5-point scale is most useful for measuring health status.

Over the last decade, there has been increasing interest in assessing health indices. Numerous studies have been carried out to measure it in various fields1-5). Methodological standards for developing measures of have been advocated by several authors6-9). In constructing a questionnaire to evaluate health status, we need to provide a response scale for answering questions. However, few articles describing health measurement scales have explained how it was decided which response scale to use. Jette10) reported a trial comparing the reliability of three different response modes: a multiple choice scale, a ladder scale and the use of index cards. Overall, reliability was similar for the three methods. Although this study was well designed to assess different types of response (i.e. choosing a response category, checking a rung of the ladder and sorting cards), only two response scales with different numbers of points (i.e. 4 (or 5) and 12 points) were compared. Ware and Hays11) compared two methods for measuring patients' satisfaction with specific medical encounters. One form used a 6-point response scale ranging from "very satisfied" to "very dissatisfied" and the other used a 5-point scale ranging from "excellent" to "poor". Reliability did not differ between the two methods, but the 5point scale better predicted whether patients intended to return to the same doctor in the future, recommended the doctor to a friend, and comply with the medical regimen. However, the two methods differed not only in the number of steps, but also in the verbal description used for the response scales. When constructing continuous scales, a fundamental question needs to be addressed: how many steps should there be? Although Jaeschke et al12). suggested that increasing response options on a scale increases responsiveness up to a point, they admitted that no one is yet sure what that point is.
Kirshner et al8). recommended the use of visual analog scales (VAS) or Likert scales with multiple options to evaluate the magnitude of a longitudinal change in status, but there is no available information about which scale is most appropriate.
VAS are normally used to rate the overall severity of pain13) Cella and Perry14) reported a high correlation between the VAS and standardized measures of anxiety, depression, and distress, and supported the use of a VAS for making a rapid assessment of feeling status when a more lengthy scale is not feasible. However, there is concern that a VAS offers a confusingly wide range of choice. Hunkinsson noted that 7% of patients were unable to complete a VAS even though they could use a descriptive scale 14).
To compare the feasibility of VAS and multiple choice scales with various numbers of steps for measuring health status, we conducted two surveys among patients with various conditions and diagnoses in July and August 1994. We selected several questions from the item pool prepared in our ongoing study to develop a questionnaire measuring the health of the patients with hematopoietic disorders.

First Survey
A total of 138 questions assessing the health status of patients with hematopoietic disorders was derived from experts, including physicians and nurses in the field, and the patients themselves. This initial item pool was reduced to a set of 35 items after repeated polling among these panelistsl5).The items covered physical, mental, and social status. Each item asked for information about the degree of difficulty or of a problem in the daily life of the patient.
Four forms of the questionnaire using different types of response scales were provided (Forms A, B, C, and D). Each consisted of the same ten questions in the same order, randomly selected from the 35 items mentioned above ( Table 1). As we were interested in continuous variables, we used continuous scales with 4, 5 and 7 points and a VAS, giving a verbal description only at each end of the scale ( Figure 1). These scales have been widely used. The VAS was chosen because it is a scale with an infinite number of points between the extremes, as respondents may want to distinguish minimal differences in their health status. Rating scales with 4, 5, 7-points were used for Forms A, B, and C, respectively, and a numerical code was assigned to each step on the scale. Form D comprised a VAS anchored by the extremes of the items being measured, with 5 cm long horizontal lines. The forms were attached together, and the order in which they were presented was systematically varied considering permutations of four forms (4 4P4 sets of questionnaires were provided for 48 subjects). Different colored sheets were used so that the forms would be easily distinguished by the respondents.
The questionnaires were given to 48 patients treated at a general hospital. Their diagnoses were distributed across 13 diagnostic classifications and included chronic renal failure, liver diseases, and lumber hernia. The patients were asked to   represented their status during the previous few days. All forms were to be completed following the order of the sheets.
On an extra page, we asked the respondents to note the color of the form they found easiest to fill in.
Second Survey As the respondents found Forms B and C the easiest to complete, we then conducted an additional survey of 153 patients in another general hospital focusing on the response scales of these forms (i.e. the 5 and 7-point scales). The respondents were patients treated in the departments of internal medicine (26.8%), surgery (12.2%), orthopedics (18.9%), ophthalmology (37.8%) and otorhinolaryngology (4.3%).
The two forms of the questionnaire contained the same items, and the instructions for rating the items were the same as in the first survey. We also asked the respondents to express the easiness of each form as a score (from 0-100). The order of the forms was systematically varied considering permutations of two forms.

Data Analyses
To decide which response scale is the most feasible, we considered two factors: the ease of completing the form and the number of questions left unanswered. We compared the frequencies of these two variables for the four different forms. The effect of respondents' age and sex on their choice of response scale was also examined using chi-squared tests or Fishers exact tests. The respondents were divided into two age groups: 55 years of age and under; and 56 years of age and over. The association between the choice of form and its presentation was also tested. As the distributions of the form presented as the first page were different between men and women in the first survey, we standardized the proportions for choice of the form in a entire group to the population in which numbers of men and women are equal and any type of form presented as the first page in the same probability (=25%) in each sex. In the similar way, the proportions for a entire group were standardized as for sex and age in the second survey.
Cronbach's alpha, which is one of the indices of reliability,6) was calculated for each form to examine whether it varied according to the type of response scale. The codes on the scales for ten items were simply added up for scoring16). The scales were reversed on certain items so that a high score consistently represented a high health status. Spearman rank correlation coefficients among the scores of the four forms were calculated. All statistical analyses of the data were carried out using SAS programs17).

First Survey
Overall, 20 out of 38 respondents answered that Form B was the easiest to complete. A discernible difference in choice was demonstrated between men and women (p<0.05) ( Table 2): one-third of men considered that Form C was the easiest to complete, but two-thirds of women opted for Form B. The distribution of the form presented as the first page was not even in each sex (Table 3). Respondents were more likely to choose the form presented as the first page. However, Form C was less frequently presented as the first page than Form B for men, and vice versa for women. Nine men and four women missed at least one of the 40 questions. Of these 13 respondents, seven missed items on two or more forms. The cumulative number of item omissions was greater for Form B than for Form C, but the number of respondents who missed items did not vary substantially according to the type of form (Table 4). Although the sample size was too small to clarify the characteristics of the respondents who missed questions, they were more likely to be men in the older age group (56 years of age and over) and select Form C as the easiest to complete than those who answered all questions. In total 54 items were omitted across all the forms and 43 of these were ascribed to seven men in the older age group. a Adjusted for sex and the order of form presentation Table 3. Selection of the easiest response scale according to the form presented as the first page and sex. Spearman rank correlation coefficients among the scores of the four forms ranged from 0.78-0.97 (0.78 between Forms A and D, and 0.97 between Forms B and C). Similar reliability results quantitated by Cronbach's alpha were achieved with each of the four forms (0.77 for Form D and 0.80 for Forms A, B and C).

Second Survey
The proportion of the respondents who gave a higher score was larger with Form B than with Form C after adjusting for age and sex (p<0.05) ( Table 5). In women, a higher score was more frequently obtained with Form B than with Form C (p<0.05). The finding in the first survey that men found Form C easier to complete than women did was confirmed in this second survey (p<0.05). Age had little effect on choice, but younger men preferred Form C. Respondents were more likely to give higher scores with the form presented as the first page (p<0.05). However, the order of form presentation did not vary by sex and age group.
The means (SD) of the scores for easiness to complete the forms were 79.8 (21.0) for Form B and 81.6 (16.5) for Form C for men, and 80.9 (16.5) for Form B and 70.2 (17.0) for Form C for women.
Item omission was noted for six men and eight women. The total number of items omitted was 24 for Form C and 22 for Form B. The number of respondents who did not answer at least one item was eight for Form B and ten for Form C. The age and sex distributions of the respondents who missed questions were similar to those who did not miss questions. Four men and 16 women did not say which form was the easiest to complete and their distributions of age and sex were similar to those of the rest of the group. However, the proportion of item omission was significantly higher for the respondents who did not say which form was the easiest to complete.
Spearman rank correlation coefficient of the scores based on ten items was 0.96 between Forms B and C. The calculated Cronbach's alpha was 0.75 for Form B and 0.76 for Form C.

DISCUSSION
This study showed that the 5-point scale was the easiest of the four types of response scales (4, 5, and 7-point scales and a VAS) to complete when applied to a questionnaire for measuring the health. In addition, item omission was least frequent with this form. The reliability in terms of Cronbach's alpha was similar for the four types. These findings suggest that a 5-point scale should be suitable for measuring health status.
We used two factors to decide on the appropriate number of points on the scale: the respondents' choice of the easiest scale to complete and the frequency of item omission. Yamaoka et al18). have noted the distribution of response category obtained in a national survey of Japanese beliefs and philosophies. Many respondents selected the neutral middle category even if a 5-point scale was applied, and so they used a 3-point scale Table 5. Selection of easier response scale to complete by sex and age; second survey.
a Standerdized for age b Standerdized for age and sex c Scores were equal between Forms B and C d Scores were not given for measuring non-disease-specific quality of life. Hayashi19). indicated this tendency to a neutral response was unique to the Japanese. In the present studies, however, the proportion of the respondents who chose non-central/extreme categories on the 5-and 7-point scales (i.e. category values 2 or 4 on the 5-point scale, and 2, 3, 5 or 6 on the 7-point scale) ranged from 16.3-41.2% and from 26.8-58.8%, respectively, for all the questions. The average numbers of items for which the respondents selected non-central/extreme and central values were 3.0 and 1.9 using the 5-point scale, and 4.1 and 1.1 using the 7-point scale, respectively. The Japanese may be familiar with 5point scale, as the evaluation system classifying pupils into five grades has been adopted in elementary and junior high schools.
We checked the frequency with which the 5-point scale was selected as the easiest to complete across the strata of different diagnoses and conditions. The choice of response scale was not affected by such variables after controlling for age and sex. Although the most appropriate type of scale might vary according to age or sex of the respondents, our findings suggest that a 5-point scale is useful for measuring health status under a variety of conditions. Therefore, we are planning to use a 5-point scale for a questionnaire measuring the health status of patients with hematopoietic disorders.
There could be a counterargument that reliability and validity of the scales should have a higher priority than respondents' preference. We wanted to assess the questionnaire with 35 items using four different response scales among patients with hematopoietic disorders. In such an assessment each respondent must provide 140 (=35 4) separate judgements and answer additional questions to check external validity. The present study was carried out to minimize the response burden placed on those respondents. As it is likely that validity depends more on item selection than the type of response scale, the validation study was preceded by deciding upon the number of points in the scale. The study subjects were not patients with hematopoietic disorders. Although the choice of the response scale was not affected by the diagnoses of the patients in the preset study, it would be premature to reach conclusion that a 5-point scale is the most appropriate for measuring the health of patients with hematopoietic disorders. However, in the next stage, we can evaluate the reliability and validity of the questionnaire with a 5-point response scale among patients with hematopoietic disorders in a pilot study.
Although we used Cronbach's alpha as an index of reliability, the reliability based on the ten items should not be a substitute for that of the final version of the questionnaire. However, we were interested in the effect of the number of points on the scale on this index in the present study, and our findings suggest that the effect is not substantial even if more items are included from the item pool into the questionnaire.
Nishisato et al20). demonstrated that categorizing two variables with normal distributions reduces the correlation. They set up distributions of two variables and rounded off the numbers to create a scale with a particular number of points. The magnitude of underestimation of the correlation using the rounded numbers increased as the number of categories decreased. The loss of correlation for seven and ten categories was quite small, but the use of five categories reduced the correlation by about 12%. Based on these results, Streiner et ah6). indicated that the correlation using the rounded numbers is equivalent to the reliability of a scale with corresponding points, if everything else is constant, and recommended that the minimum number of categories should be 5-7.
In the current study, reliability in terms of Cronbach's alpha was similar for each type of scale. We found no reduction of reliability with a decreasing number of points. In the study by Nishisato et al., rounded numbers were created from the variables that were originally set up. However, it is unlikely that respondents always select the categorized values that correspond exactly to these original values whenever the number of categories or type of scale is changed. Ikeda 21) stated empirically that the reliability coefficient tends to be reduced by increasing the number of steps, but the effect is not great because the variance of error is nearly proportional to the number of steps.