Quantitative Assessment of the Contribution of Meteorological Variables to the Prediction of the Number of Heat Stroke Patients for Tokyo

This study reveals the best combination of meteorological variables for the prediction of the number of emergency transport due to heat stroke over 64 years old in Tokyo metropolis based on a generalized linear model using 2008−2016 data. Temperature, relative humidity, wind speed, and solar radiation were used as candidates of the explanatory variables. The variable selection with Akaike’s information criterion (AIC) showed that all the four meteorological elements were selected for the prediction model. Additional analysis showed that the combination of daily mean temperature, maximum relative humidity, maximum wind speed, and total solar radiation as explanatory variables gives the best prediction, with approximately 19% less error than the conventional single-variable model which only uses the daily mean temperature. Finally, we quantitatively estimated the relative contribution of each variable to the prediction of the daily number of heat stroke patients using standardized partial regression coefficients. The result reveals that temperature is the largest contributor. Solar radiation is second, with approximately 20% of the temperature effect. Relative humidity and wind speed make relatively small contributions, each contributing approximately 10% and 9% of the temperature, respectively. This result provides helpful information to propose more sophisticated thermal indices to predict heat stroke risk. (Citation: Sato, T., H. Kusaka, and H. Hino, 2020: Quantitative assessment of the contribution of meteorological variables to the prediction of the number of heat stroke patients for Tokyo. SOLA, 16, 104−108, doi:10.2151/sola.2020-018.)


Introduction
In recent years, the demand for ambulance transport due to heat strokes has been increasing in Japan. This has resulted in social problems such as the limitation of outdoor activities. Therefore, predicting the number of heat stroke patients is not only useful for human health risk assessment, but also for ambulance management and planning of operations. Studies have sought relationships between human comfort or the number of heat stroke patients and climatological conditions (e.g. Ishigaki et al. 2001;Honda et al. 2007). Piver et al. (1999) showed that heat stroke emergency transport demand could be estimated using the daily maximum temperature and NO 2 concentrations. Miyatake et al. (2012) also reported that ambulance transport due to heat strokes was positively correlated with air temperature. Ono (2013) showed a strong correlation between the daily number of ambulance calls for heat stroke patients and daily maximum temperature. Furthermore, Fujibe et al. (2018b) showed that the higher humidity caused the higher risk under the same daily maximum temperature condition. However, the relative contribution of each meteorological variable to the prediction of heat stroke occurrence is still unclear.
To focus on the prediction of heat stroke occurrence, Fuse et al. (2014) developed a prediction model of the number of ambulance transports due to heat stroke, based on a single-variable least-squares method with the daily mean temperature as the explanatory variable. This model is useful for risk assessment, but there are some problems. First, Fuse et al. (2014) used temperature as the only explanatory variable. It is well known that there are four important meteorological elements (i.e. temperature, humidity, wind speed, and solar radiation) to determine the thermal condition. Indeed, researchers have reported a significant relationship between heat stroke occurrence or thermal stress and various thermal indices consisting of these multiple meteorological elements such as the wet bulb globe temperature (WBGT) and the universal thermal climate index (UTCI) (e.g., Ohashi et al. 2014;Suzuki-Parker and Kusaka 2016;Akatsuka et al. 2016;Di Napoli et al. 2018). The second problem is that Fuse et al. (2014) used only the correlation coefficients to select the predictive meteorological element. It is better to judge the appropriateness of elements by considering not only their correlation with the objective variable but also the information criterion score. The information criterion can evaluate the appropriateness of variables for prediction because it is calculated using likelihood and the number of variables. The goal of this study is to propose the multi-variable prediction model with the best variable combination for predicting heat stroke patients. The unique aspect of this model is that both the Akaike's information criterion (AIC) and root mean squared error (RMSE) are used to judge the best prediction model. The other objective is to independently investigate the contribution of each meteorological element to the prediction.

Experiment overview
The present study develops several statistical models to predict the daily number of ambulance transports of heat stroke patients over 64 years old (hereafter referred to as the number of heat stroke patients), using meteorological variables. The reason why we focus on patients over 64 years old is that heat stroke occurrence in elderly people has become a social problem in Japan. Their susceptibility to thermal stress is greater than that of other age groups (Semanza et al. 1996;Nakai et al. 1999). The models are based on the generalized linear model (GLM, Dobson and Barnett 2008), shown in equation (1).
Here, pat, x i , β i , N denote the number of heat stroke patients, explanatory variables (meteorological variables), regression coefficients, and the number of meteorological variables, respectively. The maximum likelihood method is used for parameter estimation. The GLM can assume not only normal distribution but also other probability distributions as objective variables. In this study, the Poisson distribution is assumed as the objective variable ( pat), and the parameters that maximize the log likelihood are estimated. In addition, the zero-inflated Poisson distribution (ZIP, Lambert 1992) was also used for fitting because the count of heat stroke

Quantitative Assessment of the Contribution of Meteorological Variables to the Prediction of the Number of Heat Stroke Patients for Tokyo
between the temperature and the number of heat stroke patients (correlation coefficient, R = 0.61). A similar result was obtained for the total solar radiation (R = 0.48). On the other hand, there is no clear correlation between relative humidity and the number of heat stroke patients (R = −0.19). Wind speed has also a small correlation with heat stroke patients (R = 0.10). Here, no collinearity was confirmed (scatter diagrams are shown in Supplement 1). Relative humidity and wind speed are not necessary to use as explanatory variables in terms of correlation with the objective variable. However, for better prediction, the appropriate variable combination should be confirmed using AIC. Therefore, the variable selection using daily mean meteorological elements (temperature, relative humidity, wind speed) and the total solar radiation was carried out.

Variable selection using daily mean values and AIC
Results of AIC indicate that all four meteorological elements are able to contribute to the prediction of heat stroke patients. Thus, all four elements should be used as explanatory variables. The model with the best combination is derived as equation (2).
Here, the same experiment (searching for the best combination) was executed based on mixed model which assumed ZIP. The RMSE and MAE of this model were 12.8 and 7.0, respectively. These values are almost the same asfor the GLM based model, thus we use the GLM based model for the following analysis because of its simplicity and ease of understanding.
patients is zero on many days. It is considered that the ZIP model is useful for data which seems to have excess zero like the data in this study. To clearly distinguish the contribution of the four meteorological elements to the prediction, the prediction models are developed in two steps as described below.
We first used AIC to determine the best combination of the explanatory variables of the statistical model from the three daily mean values (temperature, relative humidity, wind speed) and total solar radiation. It is important that daily maximum and minimum values are not used in this step. This is because the relative contribution cannot be discussed if the same elements are selected more than twice. Here, the result of this selection which has the smallest AIC is called the best combination of meteorological elements in this study. Note that, to focus on the individual effect of each explanatory variable for prediction, we do not consider the cross terms.
The next step is searching for the best prediction model by replacing the daily mean values with daily maximum and minimum values. At the end of this step, we can obtain the best prediction model (which has the smallest RMSE) in this study. Here, RMSE and MAE are calculated using observed and estimated number of heat stroke patients. Through this two-step model development, we can obtain an accurate model in which the relative contribution of each meteorological elements is easy to clarify.
To understand the method described in the above, the procedures are summarized as follows.
(i) Investigation of the best combination of meteorological elements using AIC and daily mean of meteorological elements. (ii) Investigation of the best prediction models using the same combination in (i), but replacing the daily mean values with the daily maximum and minimum values. After these two steps, the relative contribution of each meteorological elements will be investigated by calculating standardized partial regression coefficient of the best prediction model.

Data
As objective variable, we used the daily number of the heat stroke patients over 64 years old in Tokyo metropolis. The dataset is provided by the Fire and Disaster Management Agency web page (https://www.fdma.go.jp/disaster/heatstroke/post3.html). Here, the reason why we predict the number of heat stroke patients is that it is useful not only for risk assessment, but also for operation planning for ambulance management.
As candidate explanatory variables, hourly meteorological elements were used, which were obtained from the automated meteorological data acquisition system (AMeDAS) of the Japan Meteorological Agency. Note that temperature and wind speed were spatially averaged across all eight observation points in Tokyo metropolis without any weighting factors (shown in Fig. 1). On the other hand, relative humidity and solar radiation were not spatially averaged, but the values obtained at the Tokyo regional headquarters, JMA were used. The daily maximum, mean, and minimum values were calculated from these hourly data from 1:00 to 24:00 JST.
The training period was set to be July-September in the years 2008 to 2013 to estimate the model parameters. This choice entails the assumption that there were no yearly trends in the examined parameters and the model would not change from year to year. The experiments for model validation were conducted using July-September for the period 2014 to 2016.

Preliminary analysis: correlation analysis
As a preliminary analysis, we investigated the relationships between the number of heat stroke patients and daily mean meteorological variables: temperature, relative humidity, wind speed, and total solar radiation. Figure 2 shows the scatter diagram used to understand the relationship between daily mean meteorological elements and the number of heat stroke patients. A positive correlation exists

The model with the smallest error
In Section 3.2, the model with four meteorological elements was selected. In this section, we seek the best prediction model by replacing the daily mean values with daily maximum and minimum values. Regarding S 0 , the daily total value and daily maximum value are considered. Here, the best prediction model in this study means that the model shows the smallest RMSE and MAE in three target years. The errors of all models considered in this research are summarized in Fig. 3.
The GLM with daily mean temperature, daily maximum relative humidity, daily maximum wind speed, and total solar radiation produces the smallest error. We therefore call equation (3) the best prediction model.
The RMSE and MAE of this model are 11.8 and 6.7, respectively.
Hereafter, the results of the best prediction model and conventional single-variable model with daily mean temperature are compared. Figure 4 compares the time series of predicted heat stroke patients using the two models. The best prediction model in this study can reproduce spike shaped fluctuations more accurately compared with the single-variable model, in addition to reducing total error (RMSE and MAE). However, both models tend to underestimate the number of patients especially in July and early August. This is probably because these two models don't include the acclimatization effect. Here, acclimatization means that people are less adapted to heat in early summer (rapid temperature rise) compared with mid or end of summer (mentioned by Nairn and Fawcett 2015;Fujibe et al. 2018a). The error of heat stroke patients also varies in years. Possible reasons are the difference of characteristics of the summer (i.e., hot and cold summer) and/or psychological effect.

Contribution of meteorological variables to the prediction
After finding the best prediction model, standardized partial regression coefficients were calculated to investigate the relative contribution of each individual explanatory variable to the pre-diction. Here, the coefficients were calculated by normalizing the explanatory variables in this research because we used a log-linear model. Here, normalization means that the average values are adjusted to 0, and the standard deviations are adjusted into 1. The coefficients from the best prediction model are shown in Table 1. In Tokyo, the contribution of the daily mean temperature was the largest, followed by that of the total solar radiation (20% of temperature's contribution). The contribution of the daily maximum relative humidity and daily maximum wind speed were approximately 10% and 9% of that of the daily mean temperature.

Comparison with other large city in Japan
As pointed out by previous research, good predictors or good models for heat stroke occurrence are defined by region (e.g. Bobb et al. 2011;Zhang et al. 2014). Therefore, the relative contribution of each meteorological element can be assumed by region. To investigate this, prediction model is developed in Osaka prefecture that contain large city in Japan, by the same method used for Tokyo metropolis.
The AIC results indicated that all four meteorological elements should be used as explanatory variables in Osaka. However, the contributions of the explanatory variables are different from Errors of all prediction models in this research. "Tave_only" denotes the single-variable model with daily mean temperature, "Four-vars" denotes the model with the best combination (Eq. 2), "Best" denotes the best prediction model (Eq. 3).
Tokyo. The standardized partial regression coefficients used in the best prediction models for the both prefectures are summarized in Table 1.
The result from Osaka prefecture shows different features compared with Tokyo metropolis, except that the temperature is still the largest contributor. The relative humidity is the second contributor, which is about 21% of the temperature. The third contributor is the total solar radiation, which is about 15% of the temperature. The last contributor is wind speed, with only 1% contribution. A possible reason of this difference is the larger fluctuation of relative humidity in Osaka. The standard deviation of temperature is almost same in both areas, but that of relative humidity is 1.7 times larger in Osaka compared with Tokyo.
The selection of maximum, mean, minimum values is hard to explain. Thus, further analysis is needed to reveal the reason of this heterogeneity.
Considering the relative contribution of four meteorological elements in Tokyo and Osaka, this study quantitatively supports the following two findings. The first is that the temperature contributes to the prediction of the number of heat stroke patients the most out of the four variables examined. The second is that the relative contribution of the meteorological elements to the model differs. Especially, the contributions of relative humidity and wind speed are remarkably different by region. The relative contribution estimated in this study is particularly useful to determine such regional variations for better prediction. With regards to the second finding, which sees relative humidity varying greatly according to location, the study also suggests that this variable has a small contribution compared to the temperature in both regions. Generally, humidity is considered an important element to determine the heat stroke risk. However, in this research, it is shown that the relative humidity is not so determinant for prediction models.

Conclusions
In this study, we developed new multi-variable statistical models to predict the number of daily heat stroke patients in Tokyo metropolis. A unique aspect is that we used the AIC and not the correlation coefficient, to determine the effective meteorological elements for prediction. The best prediction model developed in this study includes daily mean temperature, daily maximum relative humidity, daily maximum wind speed, and total solar radiation as explanatory variables. This model can reduce the RMSE by 19% compared with the single-variable model.
Using the best prediction model, the contributions of each meteorological elements to the prediction were investigated quantitatively in Tokyo. The analysis confirmed that the temperature is a key element for predicting heat stroke occurrence. The total solar radiation contributes to the prediction by 20% of the contribution of temperature. On the other hand, the contribution of relative humidity and wind speed are relatively small compared with temperature (each of their contributions is about 10% and 9% to that of temperature). Especially, it should be noted that the relative humidity has a smaller contribution than the previous studies expected.
With the same method, the best prediction model for Osaka is also built to investigate the regional variability. A common feature in two prefectures is that the temperature has the largest contribution to the prediction. On the other hand, the relative contribution of the other variables in the other regions differs from that encountered in Tokyo. For instance, in Osaka, the relative humidity is the second contributor. However, even in Osaka, relative humidity contributes to the prediction by only 21% of temperature's contribution. Thus, the contribution of the relative humidity is smaller than expected by previous studies.

Remarks
This research has some limitations. The effect of acclimatization is not included. This is one of the possible reasons leading to underestimation of the number of heat stroke patients in July and early August. Another limitation is that the performance of the prediction model compared to other thermal indices such as WBGT is not discussed. It may be beneficial to consider such thermal indices as possible explanatory variables to develop a more practical and applicable prediction model. This research reveals that relative humidity has smaller contribution than other meteorological elements, which is in contrast with the general perspective. Considering that, we should discuss new weight values of each meteorological variable to derive a more appropriate index to predict the heat stroke risk in the future study. Table 1. The standardized partial regression coefficients of the thermal elements in each prefecture ("mean" daily mean, "max" daily maximum, "min" daily minimum).

Site
Element Standardized partial regression coefficient