Reliability and Validity of the International Physical Activity Questionnaire (IPAQ) in Elderly Adults: The Fujiwara-kyo Study

Background The International Physical Activity Questionnaire (IPAQ) is a self-reported questionnaire for assessing physical activity and has been tested in 12 countries among adults aged 18 to 65 years. The present study evaluated the reliability and validity of the IPAQ among adults aged 65 years and older. Methods The study included 164 men and 161 women selected from participants of the Fujiwara-kyo Study, a prospective cohort of elderly Japanese adults. To examine test–retest reliability, the participants were asked to complete the IPAQ twice, 2 weeks apart. The criterion validity of the IPAQ was tested by using an accelerometer. Results Based on intraclass correlation coefficients, the reliability of the total IPAQ was 0.65 and 0.57 for men and women, respectively, aged 65 to 74 years and 0.50 and 0.56 for those aged 75 to 89 years. The Spearman correlation coefficients between total IPAQ score and total physical activity measured by accelerometer (TPA-AC) were 0.42 and 0.49 for men and women, respectively, aged 65 to 74 and 0.53 and 0.49 for those aged 75 to 89. Weighted kappa coefficients between total IPAQ score and TPA-AC were 0.49 and 0.39 for men and women, respectively, aged 65 to 74 and 0.46 and 0.47 for those aged 75 to 89. Conclusions The reliability of the IPAQ was not sufficient, but the validity was adequate. Although there were some limitations with regard to repeatability and agreement in classification, the IPAQ was a useful tool for assessing physical activity among elderly adults.


INTRODUCTION
Regular physical activity (PA) is essential for healthy aging because it can reduce the risk of chronic disease, premature mortality, functional limitations, and disability. 1 Habitual PA is assessed using objective measures based on motion sensors (such as an accelerometer or pedometer) and nonobjective measures based on self-reported questionnaires. The objective measures are quantitative; their validity has been confirmed 2,3 and they have been successfully used for the assessment of elderly subjects. 4,5 However, most of these methods are not applicable for large-scale epidemiologic studies because of cost constraints and the burdens placed on the participants and researchers. 6 In contrast, PA questionnaires have few such problems and are a practical and widely used approach for assessing PA in epidemiologic research. A variety of PA questionnaires have been developed. [7][8][9][10][11] The International Physical Activity Questionnaire (IPAQ) was developed as a self-reported questionnaire for cross-national assessment of PA and has been validated in 12 countries. 11 However, the participants in those studies were adults aged 18 to 65 years. To our knowledge, only 1 study has examined the reliability and validity of the IPAQ in elderly adults, 5 although about half of the participants in that study were under the age of 65. Additional studies among elderly subjects (65 years or older) are therefore needed.
Elderly adults are less physically active than the general population, and this decline in PA might be related to the increased prevalences of chronic diseases and locomotor disability that accompany aging. 1 Therefore, when evaluating PA among elderly adults, separate assessments are needed for the young old (age 65-74) and old old (age 75-89).
The purpose of this study was to examine the reliability and validity of the IPAQ with regard to age and sex among Japanese adults aged 65 years or older, using accelerometermeasured activity as the objective criterion.

METHODS Participants
The potential subjects were 349 persons (176 men, 173 women) sampled from participants in the Fujiwara-kyo study, 12 a prospective cohort study of healthy aging in elderly adults. Subjects of the Fujiwara-kyo study were enrolled in 4 cities of Nara Prefecture, Japan. Eligible subjects were 65 years or older, living at home, and able to walk without the assistance of another person, provide self-reported information, and give written informed consent. A total of 4427 individuals (2174 men, 2253 women) gave their written consent and completed the baseline examination, including the Mini-Mental State Examination 13 (MMSE; score range, 0-30), which is used by clinical psychologists as a screening test for cognitive impairment. Thirteen subjects with an MMSE score less than 24 were excluded from the present study because cognitive problems were likely to affect the reliability and validity of PA questionnaires. 14-16 An additional 11 subjects were excluded due to incomplete IPAQ or accelerometer data. Ultimately, data from 325 participants (164 men, 161 women) were included in the subsequent analyses.
The Fujiwara-kyo study was approved by the Medical Ethics Committee of Nara Medical University.

Data collection
Self-reported PA was obtained through the Japanese version of the IPAQ (the usual 7 days, short, self-administered version). We asked participants to complete the IPAQ twice, 2 weeks apart, and wear an accelerometer (described below) for at least 2 weeks (preferably 4 weeks), starting on the day after completing the first IPAQ. The second IPAQ was scheduled to be completed on the 14th day after starting accelerometer measurement. The IPAQ and the accelerometer, which had recorded all physical activity of the participant, were returned by mail. In the event of incomplete data, a follow-up inquiry was made by telephone by one of the authors (K.T.).

Calculation of self-reported PA
According to the official IPAQ guidelines, 17 data from the IPAQ are summed within each item (ie, vigorous intensity, moderate intensity, and walking) to estimate the total amount of time spent engaged in PA per week. Total weekly PA (MET-min week −1 ) was estimated by adding the products of reported time for each item by a MET value that was specific to each category of PA. We assigned 2 different sets of MET values. The first set was the original values (original IPAQ) based on the official IPAQ guidelines 11 Accelerometer-measured PA We asked the participants to wear an electronic accelerometer (Kenz Lifecorder PLUS, Suzuken Co., Ltd., Nagoya, Japan) attached to the left or right side of a waist belt for the whole day, except when sleeping, showering/bathing, or swimming. This device has a storage capacity of 60 days and is designed to detect accelerations due to body movements. It cannot adequately sense movements during cycling or upper-body exercise. Thus, participants were asked to record the amount of time spent on cycling, upper-body exercise, and swimming during the study period. The participants were also instructed to conduct their lives as normally as possible while wearing the accelerometer. The recorded data were uploaded to a personal computer for analysis using dedicated software. The parameters calculated were daily total energy expenditure (TEE: kcal·day −1 ), daily step count (steps·day −1 ), and daily duration of PA at an intensity higher than 3.0 METs (min·day −1 ). Total PA by accelerometer per week (TPA-AC: MET·min·week −1 ) was estimated using the following equation: TEEi ðkcal ¢ day À1 Þ Â 7 ðdaysÞ where n is the total number of days analxyzed.

Statistical analysis
Test-retest reliability of the IPAQ was assessed by intraclass correlation coefficient (ICC, 1-way random-effects model) with a 95% confidence interval (CI). 19 Criterion validity of continuous data was tested by using Spearman correlation coefficients to measure the association of self-reported PA with accelerometer-based measures. Participants were classified as having high, moderate, or low daily PA, using tertiles based on the original IPAQ, modified IPAQ, and TPA-AC, after which weighted kappa coefficients (weighted κ) 20 were calculated. We excluded from the validity analysis 19 participants who reported swimming more than 2 hours per week or performing cycling/upperbody exercise for more than 5 hours per week during accelerometer measurement.
The medians of the 2 groups were compared by using the Mann-Whitney test, and Fisher's exact test was used to test the difference in the distributions of the 2 groups. SPSS 17.0 J software for Windows was used to perform the statistical analyses, and the null hypothesis was rejected when the probability value was less than 0.05. Table 1 shows the demographic characteristics, self-reported PA (using the IPAQ), and accelerometer-measured PA of the participants. Among those aged 65 to 74 years (the young old), there were no sex differences in demographic characteristics. Men were significantly more active in vigorous activity, moderate activity, original and modified IPAQ, TPA-AC, and daily step count than were women. Among participants aged 75 to 89 years (the old old), men were significantly older than women and were significantly more active in walking, original and modified IPAQ, TPA-AC, daily step count, and METs 3.0+ activities than were women.

RESULTS
The test-retest reliabilities of the IPAQ are shown in Table 2. Among the young old, the ICC ranged from 0.52 to 0.82 in men and from 0.47 to 0.70 in women. The ICC was highest for sitting and lowest for moderate activity, in both sexes. The ICC was greater than 0.60 for walking, original IPAQ, and modified IPAQ in men only and for sitting in both sexes. Among the old old, the ICC ranged from 0.39 to 0.66 in men and from 0.30 to 0.67 in women. The ICC was highest for sitting and lowest for vigorous activity in both sexes. The ICC was higher than 0.60 for moderate activity and walking in men only and for sitting in both sexes. Among all subgroups, the ICCs for the modified IPAQ were not different from those of the original IPAQ. Table 3 shows the results of criterion validity assessment. Because the second IPAQ was administered during accelerometer measurement, we compared those IPAQ data with the accelerometer data. Among the young old, the Spearman correlation coefficients between the original IPAQ and TPA-AC were 0.42 for men and 0.49 for women. TPA-AC data were significantly positively correlated with vigorous and moderate activity in men only, and with walking, original   IPAQ, and modified IPAQ in both sexes. Among the old old, the Spearman correlation coefficients between the original IPAQ and TPA-AC were 0.53 for men and 0.49 for women. TPA-AC was significantly positively correlated with walking, original IPAQ, and modified IPAQ in both sexes. In all subgroups, the Spearman correlation coefficients for both daily step count and METs 3.0+ showed a pattern of associations similar to that for TPA-AC. The Spearman correlation coefficients of the modified IPAQ did not differ from those of the original IPAQ. The

DISCUSSION
This is the first study to examine the reliability and validity of the IPAQ in adults aged 65 years or older. Some previous IPAQ studies reported reliability coefficients greater than 0.70, with ICCs of 0.87 in Japanese (mean age 33.8 years), 22 0.84 in Chinese (mean age 65.2 years), 5 0.87 in Greeks (age 19-29), 23 and a Spearman correlation coefficient of 0.76 in 12 countries (age 18-65). 11 Low reliability coefficients were also reported, namely ICCs of 0.30 to 0.62 in Norwegians (mean age 32.4 years), 24 Spearman correlation coefficients of 0.14 to 0.58 in 8 European countries (age 18-65), 25 and 0.54 in Swiss (mean age 46.5 years). 26 Nunnally 27 recommends that a coefficient of at least 0.70 is required to ensure sufficient reliability, and that 0.80 or higher is preferred. Therefore, the test-retest reliability in the present study is not sufficient. The length of the interval between the first test and the retest influences the correlation  coefficient of test-retest reliability. 27 Previous studies showed that reliability after a shorter interval (≤8 days) was high, 5,11,22 while that after a longer interval (2-3 weeks) was low. 25,26 Because our study had a longer interval, it is likely to decrease recall of the first test and lower the correlation coefficient of test-retest reliability. Another issue that can influence the reliability is question-and-answer format. 16 Because memory difficulties and cognitive problems are more prevalent in elderly adults, questions that require the use of recognition memory are preferred over those that require recall memory. 16,18 Additionally, an open-ended response format can be difficult for elderly adults to complete accurately. 28 PA questionnaires for elderly adults 4,14,15,29 have used these strategies (eg, providing lists of specified activities and a set of prespecified time ranges from which to select), and some 4,14,29 have reported high reliability after a long interval. Because the IPAQ, short version requires recall memory and uses an open-ended response format, its question-and-answer forms might have hindered the ability of elderly subjects to answer correctly and thus lowered reliability in our study. The criterion validity of the IPAQ, as indicated by the Spearman correlation coefficient, was 0.30 in 12 countries, 11 0.38 in Japanese, 22 0.33 in Chinese, 5 and 0.29 in Norwegians. 24 In criterion validity studies of PA questionnaires for elderly adults, the Spearman correlation coefficients were 0.41 in Japanese 4 and 0.37 in Americans. 15 In our study, the Spearman correlation coefficients between the original IPAQ and TPA-AC ranged from 0.42 to 0.53, which is comparable with previously published data on IPAQ and PA questionnaires for elderly persons, indicating that the IPAQ has adequate validity for elderly adults. PA questionnaires for elderly adults include questions on activities of lower intensity, which are widely practiced by elderly adults, such as household activities and gardening. 4,14,15,18,28 The IPAQ, short version has focused on vigorous and moderate PAs and has not assessed lowerintensity activities. However, this had little impact on the criterion validity in our study, probably because simple questions result in better validity than do detailed question formats 30 and because the validity of lower-intensity activities is low. 13 The use of the IPAQ for elderly adults will enable PAs to be compared across generations and across countries. The advantages of the IPAQ are thus great, and its use for elderly adults is recommended.
There are several objective methods for measuring PA, such as doubly-labeled water, 31 respiratory chambers, 32 and heart rate monitoring. 33 The doubly-labeled water method is the current gold standard. However, these methods are costly and are limited with regard to the types of PAs they can measure. Therefore, it is impractical to use such methods for measuring the habitual PAs of a large number of people. The authors believe that the use of an accelerometer as the criterion for PA was an optimal method for verifying the validity of a PA questionnaire in more than 300 community-dwelling elderly subjects, because the device is relatively inexpensive, imposes little burden on participants, even when worn for 2 to 4 weeks, does not restrict daily living activities, and has been verified to be valid by indirect calorimetry. 2 Previous IPAQ studies have noted problems with overestimation and underestimation of PA based on the IPAQ. 25,34,35 Overestimation is reported to be due to participants reporting the duration and frequency of PAs on the day they most often practiced those activities and to the tendency of people to overreport behaviors that are deemed socially desirable. 35 In contrast, the main reason for underestimation is that the questionnaire asks respondents to report only activities that were practiced for 10 minutes or longer. Therefore, activities performed for less than 10 minutes are not reflected. 25 In addition to the causes mentioned in previous studies, 25,35 overestimation could have occurred in this study because the intensities of activities, which were originally set for adults, might have seemed more intense to elderly subjects. Therefore, the intensities of activities were modified (modified IPAQ) by referring to previous studies of the intensities of activities performed by elderly adults. 4,18 However, among all subgroups, both the Spearman correlation coefficients of the modified IPAQ and the weighted κs of the modified IPAQ were not different from those of the original IPAQ. In other words, modification of the intensities of activities is unlikely to address the problem of overestimation on the IPAQ among elderly adults. Furthermore, weighted κs indicated fair to moderate agreement, which suggests the possibility of misclassification in assessing PA with the IPAQ for elderly adults.
In conclusion, the Japanese version of the short-form IPAQ was used to measure the PAs of elderly (≥65 years) adults. The test-retest reliability (n = 325) was not sufficient, but the criterion validity (n = 306) was adequate. Although there are concerns about repeatability and agreement for classification, the IPAQ was found to be a useful tool for assessing PA in elderly adults.