Regional probabilistic climate projection for Japan with a regression model using multi-model ensemble experiments

We have developed a statistical downscaling method for generating probabilistic climate projections using multiple general circulation models (GCMs). A regression model was established so that the combination of coefficients of the GCMs reflects the characteristics of the variation of observations at each grid point. We adopted the elastic net penalty to estimate the regression model, considering model projection similarities. Using an observation system with a high spatial resolution, we conducted statistically downscaled probabilistic climate projections with 20-km horizontal grid spacing. Mean precipitation is generally projected to increase associated with higher temperatures and consequently increased atmospheric moisture content. Weakening of the winter monsoon may cause precipitation decreases in some areas. There is a high probability of a temperature increase in excess of 4 K in most areas of Japan by the end of the 21st century under the CMIP5 RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm increases along the Pacific coast of Japan during the summer season and along the coast of the Japan Sea during the winter season. Our probabilistic climate projection using statistical methods is expected to provide useful information to stakeholders involved in impact studies and risk assessments.


INTRODUCTION
The global average surface temperature in 2015 broke all previous records (WMO, 2016). Increasing global warming has caused worldwide public concern and various measures to mitigate this warming trend have been proposed, while at the same time, adaptation of peoples' lives to a warmer world has become an important issue. However, in order to take appropriate measures against global warming, more reliable projections of future global warming scenarios are required.
As a result of the widespread availability of various climate scenarios from a number of countries and institutes, some studies have provided probabilistic climate projections (e.g., Murphy et al., 2009;Wang et al., 2014), which may provide useful information for risk assessments and decisionmaking (e.g., Harris et al., 2010;Pittock et al., 2001;New et al., 2007). For example, probabilistic projections can be used to estimate the cost to build or improve infrastructure, to evaluate the suitability of crops, and to examine how water and energy resources can be acquired efficiently. Various risk assessments of the consequences of global warming in the U.K., including the future of water resources, changes in sediment discharge to particular watersheds, and the thermal environment of buildings, have been conducted using 2009 UK Climate Projections (UKCP09) products (e.g., Christensen et al., 2012;Coulthard et al., 2012;Tian and Wilde, 2011).
In Asia, several impact studies on agriculture have led to a stochastic estimation of crop yields (Xu et al., 2010;Iizumi et al., 2012;Tao and Zhang, 2013). For example, Iizumi et al. (2011) probabilistically evaluated the regional impact of climate change on rice yields in Japan based on Bayesian statistics with consideration of uncertainties in parameters of the crop model and emission scenarios of the climate simulations. Tao and Zhang (2013) applied Bayesian probability inversion and a Markov Chain Monte Carlo technique to a crop model to project the probabilistic change of rice productivity and water use in eastern China. Nevertheless, there have been few studies of probabilistic climate projections that are applicable to other impact studies, which can contribute to adaptation strategies against global warming.
Probabilistic climate scenarios are generally constructed using a multi-model ensemble (MME) or perturbed physics ensemble (PPE). In the MME approach, the variability of ensemble members comes from structural and parametric uncertainties of climate models that were developed by different modeling groups. On the other hand, a PPE consists of a single climate model with perturbed physics parameters. Both ensemble methods generate a large number of climate projections, but how to synthesize and convert these projections into policy-relevant information has become a major challenge in recent years (Wang et al., 2014). The simplest way is using the average of the ensemble with equal weights (Räisänen and Palmer, 2001). In spite of difficulties with choosing the appropriate setting for prior probabilities and computational complexity, the Bayesian method has become popular due to advancements in computational efficiency and development of applications for statistical analysis (e.g., Murphy et al., 2009;Wang et al., 2014;Kerkhoff et al., 2015). Studies have shown that results achieved with the Bayesian method perform better than the simple ensemble mean (e.g., Robertson et al., 2004;Krishnamurti et al., 2000). However, the model parameterizations or physical schemes are commonly shared among models developed at different modeling centers (Masson and Knutti, 2011), which may lead to a similarity in the projections. Steinschneider et al. (2015) pointed out that variance of climate change is underestimated if intermodal correlations of ensembles of projections are ignored, which may result in a misquantification of future risks.
Based on the recent challenges highlighted by other studies, we attempted to establish a regression model based on MME to produce probabilistic climate projections with horizontally high resolution by choosing a method to properly deal with the similarity of model results. Ensemble projections from the Coupled Model Intercomparison Project Phase 5 (CMIP5) were utilized to examine regional climate projections for the area around Japan for the present (the end of 20th century) and the future (the end of the 21st century).

DATA AND METHOD
We analyzed monthly surface air temperature and precipitation to examine the regional climate change over Japan associated with global warming. We utilized Automated Meteorological Data Acquisition System (AMeDAS) meshed observation data with a horizontal grid interval of 20 km, derived from a 1-km meshed AMeDAS data set (Seino, 1993). We used all possible general circulation models (GCMs) from CMIP5 that have a series of datasets of the Historical run, RCP4.5 run, and RCP8.5 run. Forty GCMs were available (ACCESS1-0, ACCESS1-3, bcc-csm1-1, bcc- The model values were converted into a horizontal resolution with a 1.0° grid interval. The present (future) climate represents the mean state for the period from 1981 to 2000 (2081 to 2100). For GCMs that had a shortage of data during the target period, we used 20 successive years after shifting the initial year for the target period.
For the surface air temperature at each grid point, we applied a regression model: where y t is the observation, ( ) n t x is the output of the n-th GCM in the t-th month, β 0 is a constant, and β n is the regression coefficient of the n-th GCM. We assumed that the residual w t is normally distributed with zero mean and constant variance: We estimated the regression model using a regularized leastsquares method. Specifically, we minimized a cost function J defined as In Equation (3), the second term, called the elastic net penalty (Zou and Hastie, 2005), is a compromise between the L2 penalty ( shrinks the coefficients of correlated outputs of GCMs, while the L1 penalty tends to pick one GCM and ignore the rest. Tuning parameters γ and α were estimated by crossvalidation. Specifically, the combination of γ and α is selected when the cross-validation value becomes the minimum with the variation of α ranging from 0 to 1 with an interval of 0.01. The linear sum of the GCMs is compared with observations. The optimal regression coefficients are determined by evaluating the ability of the linear sum of the GCMs to represent the observation throughout the target period.
Once the regression coefficients are fixed, we can obtain an estimate of σ 2 from the residuals between the observations and the estimates. Consequently, we have identified a normal distribution of the surface temperature in each month. Using the estimated distributions, estimation of probabilistic projection, such as exceedance probability of the temperature increase and several percentile values, became possible.
For the monthly precipitation data, which do not follow a normal distribution, but instead applied the Yeo-Johnson power transformation (Yeo and Johnson, 2000) and then estimated the regression model. The Yeo-Johnson power transformation is similar to the Box-Cox power transformation (Box and Cox, 1964), except that it can be applied to zero or negative values.

RESULTS
First, we evaluated the biases of estimation for the present climate in order to examine the performance of the regression method using a multi-model ensemble (RMME). Considering our attempt to estimate a warmer climate is based on a rather cool climate, the present climate period from 1981 to 2000 was divided into warm and cool periods using the mean temperature over Japan for the observational data and each GCM separately. We considered the cool period as the training time for the regression and bias correction method, and applied them to the warmer period. A comparison of the bias of the simple ensemble mean of the GCMs (ENSM) and the simple ensemble mean after the cumulative distribution function-based downscaling method (CDFDM; Iizumi et al., 2010) was applied as shown in Figure 1. The GCM data were regridded into a 1-degree horizontal grid interval and observation and nearest grid point values of the GCM were compared. The ENSM shows a large warm bias in the inland area, which is characterized by high mountains, and a cold bias in the plains during the summer season, indicating that the ENSM bias is mostly due to the representation of the topography in the GCMs. While the ENSM biases exceed 5 K, the ensemble mean of the CDFDM successfully reduces a large bias associated with the topography. Since the relationship between the cumulative distribution function (CDF) and bias was trained during the cool period, the CDFDM shows a cold bias compared to the observation. Because bias correction was applied to annual data for each GCM, the representation of seasonal change by the GCM affects the bias of the CDFDM. On the other hand, regression coefficients were determined so that the combination of GCMs represents the characteristics of the observed values, including seasonal change. Relatively poor performance of GCMs in representing the difference of the seasonal change between cool and warm years may result in a larger bias in the CDFDM.
The ENSM cannot represent the regional precipitation distribution due to its coarse horizontal resolution as shown in Figure 1f. Both CDFDM and RMME generally reduce the bias of the GCMs. During the warm period, precipitation during the Asian summer monsoon season is lower and typhoon-related precipitation is higher in the early autumn compared to the cool period (figure not shown). Many GCMs, however, could neither reproduce locally heavy precipitation nor the difference of the seasonal march between these two periods. Intense precipitation during the summer to early autumn season was corrected by the CDFDM to some degree. On the other hand, the regression model was constructed after variable transformation, which led to a relatively large bias even in the training period. These issues are considered major reasons for a larger bias of precipitation in the RMME compared with the CDFDM. Nonetheless, the RMME successfully reduces biases in the GCMs and shows a performance comparable to the bias correction. These results indicate that the RMME can estimate regional climate through statistical downscaling with a relatively small bias.
In a second step, the probabilistic climate scenarios estimated by RMME were examined. Figure 2 shows the warming range of the monthly mean and 90-percentile surface air temperature and the probabilities of exceeding a warming range of 2 K and 4 K by the end of the 21st century, estimated by RMME using 40 GCMs with the CMIP5 RCP8.5 scenario. The figure shows that the monthly mean temperature would increase by 3-4 K in the Kyushu region and by more than 4 K in other areas. Regardless of the season, the probability of exceeding a warming range of 2 K is higher than 80% in most areas of Japan. A warming range exceeding 4 K is expected with a probability of more than 60% by the end of the 21st century under the RCP 8.5 scenario. Summer precipitation is projected to increase, especially along the Pacific coast of Honshu, Kyushu, and Sikoku, where a large amount of mean summer precipitation is observed (Figure 3). Precipitation increase is also remarkable in the western part of Hokkaido. During the winter season, precipitation will increase in northern Japan, including in the Hokkaido and Hokuriku regions, while areas south of 35°N have a high probability of precipitation decrease. The probability of monthly precipitation exceeding 300 mm is projected to increase by 20% along the Pacific coast of Honshu, Kyushu, and Sikoku during the summer season and along the coast of the Japan Sea during the winter season. The results with RMME based on the RCP4.5 scenario are essentially the same except for the amplitude of the signal.

DISCUSSION
In this section, the probability projections estimated by RMME are compared with the multi-GCM ensemble mean to validate the results and to assess the risk of regional climate change in Japan. Figure 4 shows the multi-GCM ensemble mean of surface air temperature, precipitation, and  (Ts) and (e) precipitation (Pr), and biases for (b, f) the simple ensemble mean of the GCMs (ENSM), (c, g) the cumulative distribution function-based downscaling method (CDFDM), and (d, h) the regression method using a multi-model ensemble (RMME) during the warm period in the historical data sea level pressure (SLP). During the summer season, the ensemble mean of the warming range of surface air temperature is very similar to the results of the RMME. Summer precipitation increases significantly over the subtropical ocean and the southern part of the Japanese Archipelago, where the Baiu-front is located. A high probability of projected precipitation increase along the Pacific coast is considered to be related to Baiu rainfalls during the summer season. In other words, the reliability of the summer precipitation projection depends largely on the location, seasonal march, and strength of the Baiu-front reproduced by the GCMs, because the probabilistic estimation using the multi-model ensemble is based on the GCM projection. The SLP pattern shows a small change around Japan. However, under constant relative humidity, moisture content in the atmosphere must increase associated with the temperature increase as explained by the Clausius-Clapeyron relationship. Figure 4 clearly shows a larger temperature increase in the winter season than in the summer season, and a larger temperature increase in the northern region than in the southern region. A positive SLP anomaly and decreased precipitation are found around 30°N over the North Pacific Ocean, associated with a shift of the Aleutian low to the north in the winter  season. This indicates that a modulation of storm-track and a weakened winter monsoon cause precipitation changes in the Japan region.
Regression models established at each horizontal grid point show that mean future changes of surface temperature and precipitation using the RMME are similar to those of the simple ensemble mean of the GCMs. Furthermore, our results are generally consistent with other studies based on CMIP5 GCM ensembles (Ogata et al., 2014;Seo et al., 2013;Qu et al., 2014), which demonstrates the validity of our estimation. In conjunction with the ensemble mean of the GCMs, we can interpret the projection results in terms of synoptic-scale circulation changes. In addition to the mean state, RMME provides regional probability information after statistical downscaling and bias correction, which is essential for risk assessment.
Using this probability information, we examined how the future climate would differ from the current climate using the Extreme Forecast Index (EFI; Zsótér, 2006;Harada and Takaya, 2012;Matsueda and Takaya, 2013). The EFI is a measure of the difference of the probability distribution between projected and base states of the current climate, ranging from -1 (all projection members are below the 0 percentile of the current climate states) to +1 (all projection members are above the 100 percentile). Figure 5 shows the EFI of seasonal mean surface air temperature and seasonal precipitation for 2081-2100 under the CMIP5 RCP8.5 scenario. Most areas of Japan are covered with black dots, representing areas with a temperature EFI of +0.95. This indicates that many areas would experience severe temperature increases, and that the coldest seasonal temperatures in the future would be similar or higher than the hottest temperatures in the same season of the current climate. As shown in the probability map of precipitation, a notable pre-cipitation increase is projected for regions where the mean precipitation is large, while a decrease in precipitation is projected for the Pacific coast during the winter season. Although the EFI of precipitation is small, it reflects shortterm precipitation changes. Furthermore, the probability projections imply that Japan would face growing risks associated with global warming. For example, enhanced precipitation around the western part of Hokkaido and Hokuriku may lead to a new risk of winter floods due to a change of the precipitation type from snow to rain. In-depth analyses using shorter-term precipitation data and other variables would be needed for this type of prospective risk assessment.
We developed a new method to synthesize multi-GCM ensemble experiments without using the Bayesian method. This technique provides likely climate projections based on a large number of state-of-the-art GCMs and observations. The reproducibility under the current climate by the probabilistic model or empirical statistical downscaling (ESD) methods does not necessarily ensure prediction ability for the future climate. Separate analysis of a trend component from other components (seasonal and annual), comparison of the performance of the probabilistic model to ESD methods, and the combination of these two methods should be meaningful future research subjects. Our approach is to estimate probabilistic regional climate projections, while considering climate model similarities. The results depend on the time interval and quality of the observations. Consideration of observation error should be included in any detailed analysis, which is beyond the scope of this study.

CONCLUSIONS
We have developed a regression model to produce a probabilistic climate projection by combining multi-model ensemble experiments. In this method, the GCMs were synthesized so that the combination of coefficients of the GCMs reflects the characteristics of the variation of the observa- Figure 4. Change in the simple ensemble mean of the GCMs for surface air temperature (upper panels) and precipitation (lower panels) during JJA (left panels) and DJF (right panels). ΔT and ΔP indicate the future changes for surface air temperature and precipitation. Mean (black contours) and anomaly (colored contours) of sea level pressure (SLP) are overlaid Figure 5. Extreme Forecast Indexes (EFIs) for seasonal mean surface air temperature and seasonal precipitation by the end of the 21st century estimated using the RMME based on the CMIP5 RCP8.5 multi-model ensemble. Colors show the precipitation EFI and black dots indicate the location where the EFI of the temperature exceeds +0.95 tions. Application of the elastic net penalty shrinks the regression coefficients of correlated outputs of the GCMs and, at the same time, selects the GCMs necessary for the regression model. We used observations and GCM data of 10 cooler years as the training period to construct the model at each grid point and evaluated the results using the 10 warmer years of the study period. As a result, the RMME method notably reduces the bias of the original GCMs and shows performance comparable to the CDFDM. However, estimation of monthly precipitation has a relatively large bias compared with the CDFDM, which is caused by estimation errors associated with the variable transformation.
A temperature increase in excess of 4 K with a probability of more than 60% is expected in most areas of Japan by the end of the 21st century under the CMIP5 RCP8.5 scenario. The estimated probability of monthly precipitation exceeding 300 mm increases along the Pacific coast during summer and the coast of the Japan Sea during winter. Projected changes in mean temperature and precipitation are similar to those of the simple ensemble mean of the GCMs. Mean precipitation is generally projected to increase associated with increased temperature and consequently increased moisture content of the air, as explained by the Clausius-Clapeyron relationship. Atmospheric circulation changes such as a weakening of the winter monsoon may cause a decrease in precipitation along the Pacific coast during the winter season.
This study used as many ensemble members of state-ofthe-art GCMs as possible and synthesized them into a regression model, while considering the similarity of the GCMs. The method successfully produces a probabilistic climate projection with a high horizontal resolution and it is applicable to other areas of the world where enough observations are available to conduct a statistical analysis. Our probabilistic climate projection is expected to provide stakeholders with useful information for impact studies and risk assessments.