Development of Models for Predicting the Number of Patients with Heatstroke on the Next Day Considering Heat Acclimatization

Takashi IKEDA; Hiroyuki KUSAKA

doi:10.2151/jmsj.2021-067

Abstract

We developed 55 models for predicting the number of ambulance transport due to heatstroke (hereafter, referred to as the number of patients with heatstroke) on the next day in Tokyo, using different combinations of 11 explanatory variables sets and five methods (three statistical models and two machine learning) for ten years (2010–2019). The root mean square error (RMSE) for the number of heatstroke patients was minimal when the best model was developed by combining six explanatory variables (temperature, relative humidity, wind speed, solar radiation, number of days since June 1, and the number of patients with heatstroke on the previous day) and the generalized additive model. The best model remarkably improved prediction by 52.1 % compared to a widely used model, which primarily utilizes temperature as an explanatory variable and the generalized linear model as a method. Further analysis investigating the contribution of the explanatory variables and prediction method showed that RMSE was reduced by 49.7 % using the above six explanatory variables compared to using the only temperature and by 14.6 % using the generalized additive model compared to using the generalized linear model.

1. Introduction

Heatstroke is a severe problem in Japan (Ohashi et al. 2014). In the summer of 2018, record-breaking high temperatures and numerous heatstroke incidents occurred. According to the Fire and Disaster Management Agency of the Ministry of Internal Affairs and Communications, 95,137 individuals were hospitalized, and 160 people died due to heatstroke from May to September 2018. These hospitalizations and deaths were about 1.8 times and 2.2 times higher than those in the previous three years, respectively. Most incidents occurred in July 2018, with 54,220 people hospitalized and 133 deaths due to heatstroke (Fire and Disaster Management Agency of the Ministry of Internal Affairs and Communications 2018). High numbers of patients with heatstroke are a heavy burden on emergency medical services, especially during early summer. Therefore, predicting the number of patients with heatstroke is essential for operating an appropriate emergency medical system.

Previous studies investigated deaths due to heatstroke using meteorological data (Anderson and Bell 2011; Lee et al. 2014; Fujibe et al. 2018a, b; Lee et al. 2018; Xing et al. 2020). Some studies showed the effect of heat acclimatization on the number of patients with heatstroke. For example, more deaths occur due to heatstroke during early summer than during midsummer, despite similar temperatures during these periods (Anderson et al. 2011; Lee et al. 2014; Fujibe et al. 2018a; Lee et al. 2018). Likewise, more deaths due to heatstroke occur in areas with low mean summer temperatures than those with high mean summer temperatures for the same temperature (Lee et al. 2014; Fujibe et al. 2018b). For example, in Japan, the number of deaths due to heatstroke in July was 40–50 % higher than that in August under the same mean temperature (Fujibe et al. 2018a).

Studies suggested that some indicators, except weather elements, are also related to heatstroke. For example, nitrogen dioxide (NO₂; Piver et al. 1999), economic indicators, such as income and GDP (Kim and Joh 2006), and physiological responses, such as body temperature and dehydration (Kodera et al. 2019). Note that Piver et al. (1999) focused on Tokyo in 1980–1995 when NO₂ concentrations were much higher than they are currently. However, studies conducted in various countries indicated that the effect of air pollutants on heatstroke was smaller than the effect of temperature (e.g., Shumway et al. 1988; Smoyer et al. 2000; Rainham and Smoyer-Tomic 2003). The impact of economic indicators on heatstroke was also reportedly smaller than the impact of temperature (e.g., Wang et al. 2019; Park et al. 2020).

Various explanatory variables have been used for predicting the number of deaths due to heatstroke. Fouillet et al. (2007) used weather data and long-term trends with seasonal deaths as explanatory variables. Furthermore, Wang et al. (2019) used socioeconomic factors and weather data as explanatory variables. Various modeling methods have been used for predicting the number of deaths due to heatstroke. Barnett et al. (2010) and Kim et al. (2019) used the Poisson regression model, whereas Dessai (2002) used a nonlinear regression model (NLR). Besides statistical models, machine learning has been used for prediction. For instance, Wang et al. (2019) used the random forest (RF) method.

Some analyses and predictions focused on the number of deaths due to heatstroke. However, the number of ambulance transport due to heatstroke (hereafter referred to as the number of patients with heatstroke) is also important for managing health systems.

Various studies have analyzed the relationship between weather data and the number of patients with heatstroke (Piver et al. 1999; Alessandrini et al. 2011; Miyatake et al. 2012; Ng et al. 2014; Akatsuka et al. 2016). Notably, Ng et al. (2014) found that the heatstroke risk in the Tokyo Metropolitan area is the highest during early summer. However, this study did not explicitly predict the number of patients with heatstroke.

Prediction models for patients with heatstroke have been developed. For example, Sato et al. (2020) developed a statistical prediction model using four thermal indicators (i.e., temperature, relative humidity, wind speed, and solar radiation) as explanatory variables. However, these models failed to predict cases during early summer because they neglected heat acclimatization. Recently, machine learning methods, such as RF, have been used to predict the number of patients with heatstroke (Park et al. 2020); however, they did not consider heat acclimatization.

Kodera et al. (2019) developed a model for predicting the number of patients with heatstroke, which considers the physiological response. This model first calculates the human body temperature and dehydration by inputting the temperature and relative humidity on the next day to the sub-model of physiological response and then predicts the number of patients with heatstroke using the human body temperature and dehydration. However, this model also underestimated the number of patients with heatstroke in early summer.

In this study, we developed various models for day-ahead prediction of the number of patients with heatstroke. The developed models use the number of days since June 1 and the current number of patients with heatstroke on the previous day as explanatory variables to consider heat acclimatization in addition to the above four thermal indicators. The definition of “number of days since June 1” is a variable whose value is 1 on June 1 and increases by 1 for each day (e.g., it is 2 on June 2 and 122 on September 30). Furthermore, we aimed to determine the best combination of explanatory variables for predicting the number of patients with heatstroke and the best prediction model. Additionally, we evaluated the prediction performance of statistical models and machine learning methods.

2. Methods

We developed 55 models to predict the number of patients with heatstroke by combining 11 sets of explanatory variables and five modeling methods (Fig. 1). In this study, the number of ambulance transport due to heatstroke was referred to as the number of patients with heatstroke. To determine the best explanatory variables for prediction, we compared the prediction accuracies of models using different explanatory variables. We also compared the prediction accuracy of models based using different methods to investigate the best modeling method.

Fig. 1.

Description of 11 sets of explanatory variables and five modeling methods for predicting the number of patients with heatstroke. The table on the left presents the details of the sets. T_max, daily maximum temperature; WBGT_max, daily maximum wet-bulb globe temperature (WBGT); RH_min, daily minimum relative humidity; Wind_ave, daily average wind speed; SR_sum, daily sum of solar radiation; Day, number of days since June 1; Pat_t, number of patients with heatstroke on the previous t days per 100,000 people; WBGT₁, daily maximum WBGT on the previous day. The table on the right presents the modeling methods. GLM: generalized linear model, NLR: nonlinear regression model, GAM: generalized additive model, RF: random forest, SVM: support vector machine.

2.1 Data

We predict the daily number of patients with heatstroke on the next day in Tokyo Prefecture, the capital of Japan. The population of Tokyo was 13,515,271 (National Census in 2015), and the number of patients with heatstroke in Tokyo was 7843 in 2018 (Fire and Disaster Management Agency of the Ministry of Internal Affairs and Communications 2018).

The prediction period in this study is ten years, i.e., from 2010 to 2019, and 122 days from June 1 and September 30 (a total of 1,220 days).

As outcome variables, we used daily heatstroke data provided by the Fire and Disaster Management Agency, Ministry of Internal Affairs and Communications, Japan. In this study, the number of patients with heatstroke was expressed as daily cases per 100,000 people.

As explanatory variables, we used meteorological data obtained from the automated meteorological data acquisition system (AMeDAS) of the Japan Meteorological Agency. We used hourly measurements of the Tokyo District Meteorological Observatory (35.69°N, 139.75°E) to calculate the daily values.

In this study, meteorological data from a single AMeDAS station (Otemachi) were used to represent meteorological conditions in Tokyo. The average of the temperatures at all AMeDAS stations in Tokyo is probably a better representation of the characteristics of Tokyo's temperatures than the temperature at Otemachi. However, from a statistical perspective, on days when the temperature at Otemachi is high, the temperature at other stations would also be high. Thus, we considered that the results would not be significantly affected, even if the data from multiple AMeDAS stations were used.

2.2 Explanatory variables

To select the best explanatory variables for predicting the number of patients with heatstroke, we examined 11 sets of explanatory variables, as shown in the left table of Fig. 1. For example, when the explanatory variable set V1 is used to develop the models, the daily maximum temperature is used as an explanatory variable. Additionally, when we used the explanatory variable set V2, the daily maximum wet-bulb globe temperature (WBGT) was used as an explanatory variable. For V3, four thermal indicators (daily maximum temperature, daily minimum relative humidity, daily averaged wind speed, and daily sum of solar radiation) were used. Sets V1–V3 contain only weather data. Sets V4 and V5 contain the number of days since June 1 and the number of patients with heatstroke on the previous day, respectively, in addition to the data from set V3. Set V6 contains the number of days since June 1 and the available number of patients with heatstroke in addition to the data from V3. Sets V7–V11 contain the number of patients with heatstroke on the previous day. Sets V7–V10 contain the number of patients with heatstroke on the previous 2–5 days, whereas set V11 contains WBGT on the previous day.

First, we compared the models' accuracies using sets V1–V3 to determine the optimal meteorological indicators for predicting the number of patients with heatstroke. The Japan Meteorological Agency and Ministry of the Environment provide heatstroke warnings based on the daily maximum temperature or WBGT used as an indicator of heatstroke risk in many studies in Japan (Kusaka et al. 2012; Ono 2013; Ohashi et al. 2014; Takaya et al. 2014; Suzuki-Parker et al. 2016; Akatsuka et al. 2016). Furthermore, Sato et al. (2020) showed that the above four thermal indicators are essential for predicting heatstroke trends. Nevertheless, it remains unclear which meteorological variables are the best predictors for the number of patients with heatstroke.

Second, we compared the models' accuracies using sets V3–V6 to reduce the underestimation of the prediction during early summer. For this comparison, we introduced the effect of heat acclimatization as an explanatory variable. In this study, the number of days since June 1 and the number of patients with heatstroke on the previous day were used.

Finally, we compared the models' accuracies using sets V6–V11, which provided alternatives to the number of patients with heatstroke on the previous day. Data from the previous day may not be easy to obtain in advance. However, the number of patients with heatstroke on the previous 2–5 days and WBGT on the previous day are easier to obtain. Hence, we evaluated the effectiveness of these variables on prediction accuracy.

In this study, air pollutants and economic indicators were not included in our models as explanatory variables. The reason is as follows. As discussed in Section 1, the impact of air pollutants, such as NO₂, and economic indicators on heatstroke is still unclear in Japan, at least for the daily variation on heatstroke patients, although these might affect decadal-scale variation. Indeed, Rainham and Smoyer-Tomic (2003) stated that the impact of air pollutants on heatstroke in Toronto 1980–1996 was small. During this period, the mean NO₂ concentration in Toronto was 0.0238 ppm (Rainham and Smoyer-Tomic 2003). However, the NO₂ concentration in Tokyo in 2018 is 0.015 ppm, according to the Bureau of Environment, Tokyo Metropolitan Government. Japan is equivalent to or below air pollution level compared to Toronto at that time. Thus, it can be assumed that air pollutants do not play a significant role in heatstroke in Japan.

The impact of economic indicators on heatstroke is also unclear in Japan. Indeed, Wang et al. (2019) stated that the impact of economic indicators on heatstroke was smaller than that of temperature on heatstroke in China. Since Japan has a higher GDP than China, it can be assumed that economic indicators in Japan have little effect on heatstroke than China.

2.3 Modeling methods

We considered five modeling methods: generalized linear model (GLM), NLR, generalized additive model (GAM), RF, and support vector machine (SVM). Although neural networks can perform regression, we did not consider them due to the limited data samples available for training (around 1,000 samples) and the number of explanatory variables being six. A brief explanation of these methods is given below.

a. Generalized linear model

GLM was proposed by Nelder and Wedderburn (1972). Its outcome variable is represented by a nonlinear transformation of the linear sum of the explanatory variables as follows:

Here, i and t are the number of explanatory variables and the prediction day, respectively, (t = 1–122). y_t is the predictive number of patients with heatstroke; x_i,t is the explanatory variables; β₀ and β_i are the intercept and regression coefficient, respectively, obtained from the maximum likelihood estimation. Similar to Sato et al. (2020), we assumed a Poisson distribution because the outcome variable was greater than zero.

In Eq. (1), we used the exponential function because of two reasons. The first reason is that the number of patients with heatstroke increases exponentially with an increase in the daily maximum temperature or WBGT (Fig. 2). The other reason is to ensure that the predicted number of patients with heatstroke is always greater than zero. The relationship between the number of patients with heatstroke and the daily maximum temperature or daily maximum WBGT is exponential (Fig. 2). In Japan, such a relationship is commonly used to predict the number of patients with heatstroke. Thus, we considered GLM with daily maximum temperature as the benchmark model (widely used model).

Fig. 2.

Relationships between the number of patients with heatstroke and (a) daily maximum temperature and (b) daily maximum WBGT.

b. Nonlinear regression model

NLR can also represent the nonlinear relationships between the outcome and explanatory variables (Bates and Watts 1988). The NLR model is represented by Eq. (2) as follows:

where y_t−1 is the number of patients with heatstroke on the previous day, and x_i is an explanatory variable other than y_{_t−1}. We used the nonlinear least-squares method to estimate parameters α₁, β₀, and β_i . Unlike GLM, the number of patients with heatstroke on the previous day follows a linear function in NLR to prevent overestimation, which can occur in GLM. For example, when the daily maximum temperature and the number of patients with heatstroke on the previous day are high, GLM provides large values given its exponential function.

c. Generalized additive model

GAM represents relationships between explanatory and outcome variables as a nonlinear function (Hastie and Tibshirani 1990). The GAM model is represented by Eq. (3) as follows:

where f_i (x_i,t ) is a third-order smoothing spline curve per explanatory variable. Similar to GLM, the GAM parameters are obtained from the maximum likelihood estimation assuming a Poisson distribution.

The smoothing spline curve is given as f_i (x_i,t) which minimizes the Penalized residual sum of squares, shown in Eq. (4) as follows:

Here, n is the number of training data. λ is the smoothing parameter. In this study, λ = 0.8.

The relationship between meteorological indicators or the number of patients with heatstroke on the previous day and the predicted number of patients with heatstroke is linear (Figs. 3a–e). However, the relationship between the number of days since June 1 and the number of patients with heatstroke is nonlinear (Fig. 3f). Since GAM can express such a nonlinear relationship, it may provide higher prediction accuracy than GLM and NLR.

Fig. 3.

Effect of explanatory variables on patients with heatstroke: (a) daily maximum temperature, (b) daily minimum relative humidity, (c) daily average wind speed, (d) daily sum of solar radiation, (e) number of patients with heatstroke on the previous day per 100,000 people, and (f) number of days since June 1. The smooth curves were obtained from GAM.

d. Random forest

RF is a collection of multiple decision trees (Breiman 2001). High prediction performance can be obtained by collecting several trees. RFs introduce randomness when sampling the training data and determining the tree-split functions. Thus, RFs prevent overfitting and provide high generalization performance. A schematic diagram of the RF used in this study is shown in Fig. 4. Prior to RF, explanatory variables and an outcome variable over nine years were prepared as training data, and variables for one year were prepared as test data. There are two steps in RF: making decision trees and prediction. First, we explain decision trees. In this step, the splitting conditions in a decision tree are made automatically when the training data are input into the RF algorithm. Simultaneously, decision trees are automatically created. For example, in the decision tree 1, the training data were divided based on whether the maximum daily temperature is higher than 32°C. These divided data were further divided based on whether the relative humidity was higher than 70 %. In this way, a decision tree was constructed. Different decision trees were created by changing the conditions for splitting the data in decision trees (Fig. 4). Second, we explain the prediction. In this step, test data were inputted into the decision trees created in step 1, and these trees outputted provisional values (Prediction 1, 2, …, n in Fig. 4). The predicted value of RF, i.e., the number of patients with heatstroke, was calculated using the mean of provisional values. RF requires the parameters ntree, which is the number of decision trees, and mtry, which is the number of explanatory variables used for splitting. In this study, ntrees was set to 2,000. The value of mtry was changed for each explanatory variable set used for models: mtry = 1 for the explanatory variable set V1 and V2, mtry = 2 for V3–V5, and mtry = 4 for V6–V11.

Fig. 4.

Schematic diagram of RF. RF consists of different decision trees created with the training data. RF outputs the predicted value, i.e., the number of patients with heatstroke, using test data.

e. Support vector machine

SVM is a nonlinear regression method (Vapnik and Lerner 1963). As one of the best-known methods, SVM can provide high generalization performance through a nonlinear function called a kernel. A schematic of SVM used in this study is shown in Fig. 5. To explain the concept of SVM, we used two two-dimensional planes with two explanatory variables and a divided line in this figure. The color density of the point in this figure indicates the number of patients with heatstroke. There are as many dimensions as the number of explanatory variables in the training data and divided plane in the actual prediction with SVM. Prior to SVM, explanatory variables and an outcome variable for nine years were prepared as training data, and variables for one year were prepared as test data. There are two steps in SVM: creating a divided line and predictions. First, we explain the creation of a divided line. In this step, the divided line in SVM is created automatically when the training data are input into the SVM algorithm. For example, a divided line that divides the number of patients with heatstroke into two groups is trained on a two-dimensional plane. Second, we explain the prediction. In this step, test data are given on the two-dimensional plane, which has only the divided line created in step 1. The predicted value of SVM, i.e., the number of patients with heatstroke, was estimated as the distance between the divided line and test data.

Fig. 5.

Schematic diagram of SVM for two explanatory variables. The color density of the point indicates the number of patients with heatstroke. The black line is a divided line. There are as many dimensions as the number of explanatory variables in training data and divided planes in the actual prediction with SVM.

When performing regression using SVM, it is necessary to set the insensitive loss function ε . In this study, we set ε = 0.1.

2.4 Verification of prediction accuracy

To build predictive models and evaluate these models, we used cross-validation. Nine out of ten years of data were used for training to estimate the parameters and build the models. The data from the remaining year were used as test data for validation. An overview of the cross-validation used in this study is shown in Fig. 6. For example, to predict the number of patients in 2010, we used the data for the nine years from 2011 to 2019 as training data and the data for 2010 as test data. Similarly, predictions were made for each year from 2011 to 2019.

Fig. 6.

Overview of the cross-validation. For prediction in 2010, data from 2011 to 2019 were used as training data, and data from 2010 were used as test data. Similarly, predictions were made from 2011 to 2019.

In this study, root mean square error (RMSE) and mean absolute error (MAE) were used as indicators of prediction accuracy. The RMSE and MAE are shown in Eqs. (5) and (6), respectively.

Here, y_t and o_t are the predicted and reported number of patients with heatstroke, respectively. n is predicted days (n = 122). We calculated RMSE and MAE for each year from the results of the 122-day prediction for ten years.

2.5 Calculation of WBGT

WBGT is defined by Eq. (7) (Yaglou and Minard 1957)

where T_a is the dry-bulb temperature (°C), T_w is the wet-bulb temperature (°C), and T_g is the globe temperature (°C). Since T_w and T_g are not measured by AMeDAS, we estimated T_w and T_g using the methods described by Stull (2011) and Okada and Kusaka (2013), respectively. The estimation equations are given by Eqs. (8) and (9).

where RH is the relative humidity (%), S₀ is the global solar radiation (W m⁻²), and U is the wind speed (m s⁻¹). There is no special assumption for the estimation of T_w and T_g. T_w was estimated by numerically solving the theoretical equation. T_g was estimated based on the heat balance equation of the black globe in which the unknown parameters were estimated from the observations.

3. Results

The models are described in brief. For example, a model that combines explanatory variable set V1 and modeling method GLM is denoted as V1_GLM.

3.1 Prediction in 2018

We predicted the number of patients with heatstroke for ten years (2010–2019). We show the results of 2018 as a representative example since it recorded the highest number of patients with heatstroke; thus, providing the largest prediction errors compared with the other evaluated years. Figure 7 shows the number of measured and predicted heatstroke cases in 2018. Table 1 presents the RMSE and MAE of the prediction in 2018 for each model using explanatory variables sets V1–V6. The prediction for other years is shown in Appendix A.

Fig. 7.

Daily number of patients with heatstroke from June 1 to September 30, 2018. Gray vertical bars: observed number of patients with heatstroke. Green line: predicted number of patients with heatstroke by V1_GAM. Blue line: predicted number of patients with heatstroke by V3_GAM. Purple line: predicted number of patients with heatstroke by V4_GAM. Orange line: predicted number of patients with heatstroke by V5_GAM. Red line: predicted number of patients with heatstroke by V6_GAM.

V3_GAM predictions underestimated the number of patients with heatstroke from July 17 to August 4 and overestimated them from August 13 to September 17. Compared with V3_GAM, V6_GAM further reduced underestimation during early summer and overestimation during late summer (Fig. 7a), reducing the RMSE and MAE by 48.1 % and 43.9 %, respectively (Table 1). Compared with V1_GAM, V6_GAM remarkably reduced underestimation during early summer (Fig. 7a), reducing the RMSE and MAE by 58.2 % and 52.7 %, respectively (Table 1).

Regarding the effect of the number of days since June 1 and the number of patients with heatstroke, V4_GAM slightly reduced overestimation during late summer (Fig. 7b), reducing the RMSE and MAE by 13.3 % and 19.6 %, respectively, compared to V3_GAM (Table 1). Compared with V3_GAM, V5_GAM reduced underestimation during early summer and slightly reduced overestimation during late summer (Fig. 7b); thus, reducing the RMSE and MAE by 40.4 % and 36.0 %, respectively (Table 1).

Overall, the number of days since June 1 further enhances the prediction accuracy, particularly by reducing overestimation during late summer. This variable reflects the long-term trend of heat acclimatization throughout summer. In contrast, the number of patients with heatstroke on the previous day increases prediction accuracy during early summer since it reflects the short-term trend of heat acclimatization.

3.2 Ten-year prediction (2010–2019)

a. Comparison of prediction accuracies considering only meteorological indicators

Figures 8 and 9 show the RMSE and MAE boxplots for each model using sets V1–V6. Table 2 presents the ten-year average RMSE and MAE for each model using sets V1–V11. Using the four thermal indicators separately for prediction provided higher model performance than using WBGT. Compared with V1_GAM, the ten-year average RMSE of V2_GAM and V3_GAM decreased by 16.6 % and 23.5 %, respectively. The ten-year average MAE of these models also decreased by 10.7 % and 19.6 %, respectively (Figs. 8c, 9c, Table 2).

Fig. 8.

Cross-validated root mean square error (RMSE) for ten years (2010–2019). The models combine a set of explanatory variables and a prediction method. (a) GLM, (b) NLR, (c) GAM, (d) RF, and (e) SVM. The gray cross within each box represents the ten-year average RMSE.

Fig. 9.

Cross-validated mean absolute error (MAE) for ten years (2010–2019). The models combine a set of explanatory variables and a prediction method. (a) GLM, (b) NLR, (c) GAM, (d) RF, and (e) SVM. The gray cross within each box represents the ten-year average MAE.

The same trend was observed for the other models, except for SVM. The RMSE and MAE values were smaller using V3, V2, and V1, in that order. Compared with the models using set V1, the models using set V2 reduced the ten-year average RMSE by 2.7–16.6 % and MAE by 7.9–20.6 %. The models using V3 also reduced the ten-year average RMSE by 13.7–23.8 % and MAE by 12.6–22.7 % (Figs. 8a–e, 9a–e, Table 2).

b. Effect of the number of days since June 1 and the number of patients with heatstroke on the previous day

Compared with V3_GAM, the ten-year average RMSE of V4_GAM, V5_GAM, and V6_GLM decreased by 7.5, 33.6, and 37.8 %, respectively, and the ten-year average MAE decreased by 10.1, 29.1, and 34.7 %, respectively (Figs. 8c, 9c, Table 2). V4_GAM exhibited lower performance regarding the ten-year average RMSE and MAE than V5_GAM. Furthermore, compared with V1_GAM, the average RMSE and MAE of V6_GAM decreased by 49.8 % and 46.8 %, respectively. This trend was also observed in the other models (Figs. 8a–e, 9a–e, Table 2). The reason for this result has been given in Section 3.1.

c. Variables as an alternative to the number of patients with heatstroke on the previous day

Figure 10 shows the RMSE and MAE boxplots of GAM using sets V6–V11. Furthermore, the tenyear average RMSE and MAE increased from 0.112 to 0.160 and 0.063 to 0.082 in V6_GAM and V10_GAM, respectively. The ten-year average RMSE and MAE of V11_GAM were 0.140 and 0.076, respectively (Figs. 10a, b, Table 2). V6_GAM and V7_GAM provided smaller ten-year average RMSE and MAE than V11_GAM, whereas the other models provided larger average RMSE and MAE than V11_GAM. Other models, except RF, achieved the same results as GAM (Table 2).

Fig. 10.

Cross-validated error of models combining sets V6–V11 and GAM for ten years (2010–2019). The gray cross within each box represents the ten-year average. (a) RMSE and (b) MAE.

The models considering the number of patients with heatstroke two days before exhibited higher performance than those considering WBGT on the previous day. Since these data are usually available, the number of patients with heatstroke two days before can be used instead of that from the previous day if the latter is not available. Moreover, when the number of patients with heatstroke two days before is also unavailable, WBGT on the previous day represents a useful alternative for prediction.

d. Intermodel comparisons

Figure 11 shows the RMSE and MAE boxplots of each model using set V6. The ten-year average RMSE values of GLM, NLR, GAM, RF, and SVM were 0.131, 0.123, 0.112, 0.132, and 0.122, respectively, and their average MAE values were 0.071, 0.069, 0.063, 0.72, and 0.072, respectively. Among the five models, GAM exhibited the highest performance, with V6_GAM being the best model. The RMSE and MAE values of V6_GAM were 14.6 % and 11.8 % smaller than those of V6_GLM, respectively (Table 2). The contribution of changing the modeling methods to the prediction accuracy was smaller than that of changing the explanatory variables. This is also supported by the prediction in 2018 (Fig. 12). The prediction for other years is shown in Appendix B.

Fig. 11.

Cross-validated error of models using set V6 and each modeling method for ten years (2010–2019). The gray cross within each box represents the ten-year average. (a) RMSE and (b) MAE

Fig. 12.

Same as Fig. 7a but for green line: predicted number of patients with heatstroke by V1_GAM; blue line: predicted number of patients with heatstroke by V6_GLM; red line: predicted number of patients with heatstroke by V6_GAM.

Finally, compared with the benchmark model (widely used model: V1_GLM), the ten-year average RMSE and MAE of the best model (V6_GAM) were reduced by 52.1 % and 47.9 %, respectively (Table 2).

A nonlinear relationship was observed between the number of days since June 1 and the number of patients with heatstroke (Fig. 3f). Since GAM, instead of GLM and NLR, describes this nonlinear relationship, it exhibits higher performance. Other relationships between the four thermal indicators and the number of patients with heatstroke were almost linear (Figs. 3a–d). Thus, even simple statistical models like GLM and NLR can represent these relationships and provide suitable predictions. NLR outperformed GLM because the linear function for the number of patients with heatstroke on the previous day in NLR was more appropriate than the exponential function in GLM (Fig. 3e). For GLM, the relationship between the number of patients with heatstroke and that from the previous day is described by a straight line in Fig. 3e. However, for NLR, this relationship is described by the logarithm in Fig. 3e. The prediction accuracy of NLR was better than that of GLM because NLR could represent the actual relationship better than GLM.

RF could not achieve higher performance than GAM, especially for the number of patients with heatstroke in the abnormal hot summer of 2018 in Japan. Since each decision tree in RF is built by splitting training data, it cannot predict an outcome variable's values outside the training data range.

4. Conclusions

We developed 55 models for predicting the number of patients with heatstroke on the next day using different combinations of 11 sets of explanatory variables and five methods (three statistical models and two machine learning methods). A comparison of the ten-year prediction accuracies of these models led to the following conclusions:

・The model, using the four thermal indicators as explanatory variables, provides the highest prediction accuracy using only weather data. The ten-year average RMSE of this model is 23.8 % smaller than that of the single-variable model using temperature.
・The underestimation during early summer is reduced by considering the number of days since June 1 and the number of patients with heatstroke. Using these variables and the four thermal indicators, the ten-year average RMSE is reduced by 49.8 %. The reason is that the number of days since June 1 and the number of patients with heatstroke on the previous day represent the long-term and short-term trends of heat acclimatization, respectively.
・The best model in this study is based on GAM with the four thermal indicators, the number of days since June 1, and the number of patients with heatstroke. The ten-year average RMSE of the best model is 14.6 % smaller than that of the model based on GLM with the same explanatory variables. Therefore, the contribution of changing the modeling methods to the prediction accuracy was smaller than that of changing the explanatory variables.
・The ten-year average RMSE of the best model is 52.1 % smaller than that of the conventional GLM with temperature as an explanatory variable.
・Although unexpected, GAM achieves higher prediction accuracy than machine learning methods in this study. Further improvement in prediction accuracy may be achieved by adjusting the parameters of the machine learning methods. We will explore such improvements in future work.

The number of patients with heatstroke on the next day can be accurately predicted using these prediction models. It will help operate an efficient medical system, including allocating ambulances. Additionally, using the predicted number of patients with heatstroke to create heatstroke alerts will allow citizens to protect themselves from heatstroke.

This study had some limitations. We developed and verified the models only for the Tokyo case. It is necessary to validate the models and tune the model parameters in other regions to increase reliability. Another limitation is that this study's models do not consider the effects of climate change. For example, if the summer starts earlier in the future, the relationship between the number of days since June 1 and the number of patients with heatstroke will change. Additionally, citizens' tolerance to heat may change in the future. Thus, these models are unsuitable for future climatic conditions. For future projections, it is necessary to tune the effect of the number of days since June 1 and add the long-term effect of heat acclimatization.

Acknowledgment

This research was performed by the Environment Research and Technology Development Fund JPMEERF20192005 of the Environmental Restoration and Conservation Agency of Japan.

Appendix A