Statistical Downscaling of AGCM60km Precipitation based on Spatial Correlation of AGCM20km Output

: A statistical downscaling method based on regressing precipitation data is introduced and applied to 60-km resolution Atmospheric General Circulation Model (AGCM60km) output for daily precipitation. The method utilizes a regression domain with a 3×3 60-km grid, and the downscaling target is 3×3 20-km grids in the center of the regression domain. By shifting the regression domain one grid by one grid in 60-km resolution, the same form of regression model, but different regression coefficients for each 20-km grid, can be applied to all the downscaling target areas. Based on application tests for the Asian Monsoon region, the statistical downscaling algorithm shows extremely effective results with a certain pattern of regression error. The monthly based downscaled results from AGCM60km output shows a rather good match to the monthly mean precipitation amount of AGCM20km. The downscaled results also show a plausible mimic to the AGCM20km output in the frequency of daily precipitation amounts; however, the results showed noticeable limitations in simulating low rainfall amounts (e.g., less than 5 mm d –1 ), especially on land.

However, the diversity of ensemble outputs from AGCM20km is still limited compared to the number of ensemble outputs from AGCM60km because of its high computational burden. If the number of the AGCM20km ensem-ble outputs is as many as the number of the AGCM60km outputs, such as more than 100 ensemble outputs from the database for Policy Decision making for Future climate change (d4PDF) experiments (Mizuta et al., 2016), the applicability of the model outputs in variant impact assessment research will be drastically increased.
Considering the specific characteristics of AGCM20km and AGCM60km, this research investigates a possibility of statistical downscaling (SDS) of the AGCM60km precipitation output into 20 km horizontal resolution. Kim et al. (2014) have developed and tested an efficient SDS technique by considering only spatial correlation of precipitation data, which is based on the so-called formatted regression frame (FRF) algorithm. The main purpose of this article is to apply the FRF algorithm to the AGCM60km precipitation output and to investigate the effectiveness and the limitation of the algorithm by comparing the SDS output to the original AGCM20km output.
The SDS issue has a long history of development and application in the field of hydrology (Fowler et al., 2007) and several types of SDS techniques are known to be useful; for example, the regression for canonical correlation (Schmidli et al., 2007), the Markov chain weather generator (Wilks and Wilby, 1999), and the clustered weather typing (Fowler et al., 2005). However, in many cases, statistical relationships between model variables and observation are not strong enough to build a stable SDS model (Wilby and Wigley, 1997). Furthermore, most critically, it is not certain whether the statistical relationship developed with the present climate data can simulate the statistical relationship of the future climate. On the other hand, the dynamic downscaling (DDS) method with a regional climate model (RCM) provides stable and reasonable output based on physical backgrounds, however it requires considerable computing resources and additional model setup (Wilby et al., 2000;Fowler et al., 2007).
The FRF algorithm of Kim et al. (2014) is an SDS method that can avoid the aforementioned critical issues in the conventional SDS method and take advantage of DDS, based on the statistical linking of two different horizontal resolutions of AGCM outputs -20 km and 60 km -for both present and future climates independently. The FRF algorithm demonstrated robustness from the experiments with observed precipitation data, and the algorithm is tested in this paper to evaluate its effectiveness and limitation with the AGCM60km Correspondence to: Sunmin Kim, Department of Civil and Earth Resources Eng., Graduate School of Engineering, Kyoto University, C1-183, Kyotodaigaku, Katsura, Nishikyoku, Kyoto 615-8540, Japan. E-mail: kim. sunmin.6x@kyoto-u.ac.jp precipitation output. In the next section, brief introductions to the FRF algorithm and experiment design are described. Application results and analysis will be illustrated sequentially.

METHODOLOGY
The FRF algorithm utilizes precipitation data only, and it considers spatial correlation of precipitation within a regression domain with the 3×3 60-km grids. The algorithm makes a regression relationship between a 20-km grid precipitation in the center of the regression domain and the surrounded 60-km grids. Because one 60-km grid can be divided into 3×3 20-km grids, there are nine different regressions for nine 20-km grids in the center of each regression domain. By shifting the regression domain one grid by one grid in 60-km resolution, the same form of regression model, but different regression coefficients for each 20-km grid, can be applied to the entire downscaling target area, except within grids lying on the boundary.
In this study, the SDS subject is the daily precipitation amount, and the concept of the regression model is based on Kim et al. (2014) as: , 1 1 , , 2 2 , , 9 9 , , , , , 1 ,1 1,1 2,1 9,1 ,1 ,2 1,2 2,2 9,2 ,1 , where r k,i is the daily precipitation amount of day i on a 20-km resolution grid k, and C k,j is the regression coefficient of R j,i , which is the daily precipitation of a 60-km resolution grid from 1 to j in a regression domain. ε k,i is regression residual.
For n days of 20-km and 60-km precipitation data, Equation 1 can be rewritten into matrix form as Equation 2 or Equation 3, where Z is the vector of n days precipitation in 20-km resolution, A is the matrix (n × 9) of n days of precipitation in the surrounded 60-km grids, x is the vector of regression parameters, and v is the residual vector. The parameters can be estimated by minimizing the squared residuals as shown in Equation 4. Many numerical schemes help decide the optimal regression parameters, and the household transformation is quite helpful in decreasing the computation burden when the number of daily precipitation n is large (e.g., n = 900 for 30 days of 30 years of data).
The parameters (or regression coefficients) are estimated on a monthly basis to consider the seasonal variation of precipitation patterns. Because the parameters are optimized with the whole present (or future) climate term for each month, the parameters are time invariant in each month. And every different combination of 60-km precipitation data determines 20-km downscaled precipitation amount through the parameters. Figure 1 illustrates the procedure of SDS on an AGCM60km precipitation output based on the FRM algorithm. First of all, an AGCM20km precipitation output is scaled up to 60-km resolution by averaging 3×3 20-km grids, and regression parameters of FRF are estimated based on the relationship of the AGCM20km output and the scaled up 60-km resolution data. Finally, the estimated parameters of FRF are utilized for the downscaling of AGCM60km. To evaluate downscaled precipitation output, two major characteristics of precipitation were investigated, which are 1) monthly mean precipitation amounts to investigate overall volume errors and 2) statistical tests on daily precipitation amounts to decide whether those two data sets have statistical identity. In this research, a two-sample Kolmogorov-Smirnov test, which quantifies a distance between the empirical cumulative distribution functions of two samples, is utilized for the statistical test.
Bias correction can be utilized on an AGCM60km output before it is downscaled to 20-km resolution data, because the downscaled data will contain the characteristics of the AGCM60km output, such as spatial and temporal distribution of precipitation amounts. If the AGCM60km precipitation output has too many different characteristics compared with the scaled up AGCM20km output, the verification of downscaled data can never be good; even if the downscaling procedure is good. In this research, we verified the downscaled data without bias correction and with bias correction as well.
Note that only those outputs for present climate (25 years of data from 1979-2003) were utilized in this downscaling and verification procedure due to the limited length of the paper. However, all procedures are exactly the same when handling the output for the future climate scenario (another 25 years of data from 2075-2099). The parameters can be estimated with the AGCM20km output for the future climate scenario, and those parameters are applied to the downscaling of the AGCM60km output for the future climate scenario.

RESULTS AND ANALYSIS
Based on the illustrated downscaling procedure in the pre- Figure 1. Downscaling and verification procedures of the AGCM60km precipitation output based on the precipitation pattern of the AGCM20km output. Bias correction on AGCM60km can be considered when necessary vious section, the FRF algorithm was tested in the Asian Monsoon region (longitude: 60°E-150°E and latitude: 15°S-50°N). Firstly, regression parameter estimation was fulfilled by the relationship between the AGCM20km output and the scaled up AGCM20km data. By utilizing the estimated parameters, statistical downscaling on the AGCM60km output was carried out. Basic statistics from this procedure are given in Table I, which are the correlation coefficient and root mean squared error (RMSE) of monthly precipitation amount. The correlation coefficient and RMSE are between the AGCM20km output and the downscaled results during the parameter estimation and between the AGCM20km output and the downscaled results from the AGCM60km using the estimated parameters.
From Table I, the correlation coefficient and the RMSE show no significant changes in different seasons and months, while the summer season shows rather increased RMSE in the downscaled output from the AGCM60km when compared to the other months (see June, July, and August in the downscaled results). This is considered as being due to large amounts of monthly precipitation during those seasons. To understand the pattern of monthly precipitation, Figure 2 shows the monthly precipitation amounts (for February and August, representatively) from the AGCM20km output and the downscaled results using the AGCM60km output.
Note that though the overall pattern is rather similar, there are slight differences between the original 20-km output and the downscaled results in Figure 2, not only in August but also in February. To look into the differences more closely, only Japan's region is shown in Figure 3, with the AGCM60km output as well to understand the downscaling effect. Figure 3 shows that the different monthly mean precipitation amounts of downscaled results originate from the AGCM60km output; however, the SDS procedure provides similar spatial contrast in a sense of precipitation amounts in 20-km resolution.
To validate the downscaled results from an aspect of daily precipitation amount frequency, a two-sample Kolmogorov-Smirnov test was carried out with the AGCM20km output and the downscaled result from the AGCM60km output. The test is a non-parametric goodness of fit test with a p value to decide whether those two samples are from the same population. Figure 4 shows the test results for February and August. Here, the p value from the two-sample Kolmogorov-Smirnov test stands for the possibility of the same population for the two sample sets, which means that the blue in Figure 4 means the frequency of the downscaled daily precipitation amount is not the same as the frequency of the original AGCM20km output, showing a 10% significance level (α = 0.1). Regrettably, most of the downscaled results show different frequencies than the one of the AGCM20km output. However, considering that the frequency of low rainfall amounts is more difficult to simulate in GCMs and the frequency of high rainfall amounts is more important to most GCM output users, the two-sample Kolmogorov-Smirnov test was fulfilled again with several threshold values, such as samples of daily rainfall amounts larger than 1 mm or 2 mm.
Interestingly, p values from the two-sample Kolmogorov-Smirnov test using even low threshold values show drastic improvements, as shown in Figure 5, where the p values from the precipitation samples are larger or equal to 2 mm d -1 . Even with the sample of a 0 mm d -1 threshold, which means the sample excludes non-precipitation days, p values from the test show noticeable improvement in many regions, and the p values are continuously getting better up to the 5 mm d -1 threshold. With the threshold larger than 5 mm d -1 , the improvement of p values was not significant in many regions.
It should be noted that the downscaling and validation procedures described above are carried out with the AGCM60km output as it is. Because there are inherent differences in the AGCM60km output compared to the AGCM20km output -in precipitation amounts and precipitation frequencies -another attempt at downscaling and validation was carried out with a bias-corrected AGCM60km output. Here, the bias of the AGCM60km output stands for the  differences between the AGCM60km and the AGCM20km outputs. Bias correction of the AGCM60km output was done with the scaled up AGCM20km output as a reference value. By comparing daily precipitation amounts of each month between those two data sets, daily scaling with a ratio difference was applied to the AGCM60km output. The bias-corrected AGCM60km provided excellent data sources for the downscaling procedure. Monthly precipitation amounts of downscaled data based on the bias-corrected AGCM60km shows a great match to the AGCM20km output, which is a rather reasonable result.
The main curiosity regarding the newly downscaled results is the frequency similarity to the daily precipitation of the AGCM20km output. Figure 6 shows p values from the Kolmogorov-Smirnov test, using two samples that are the AGCM20km output and the downscaled AGCM60km output with the bias correction. Even without any threshold, the p values show the increased probability of the frequency of daily precipitation amounts, especially in the oceanic regions around the equator and low latitude areas (see the upper panel of Figure 6 and compare it with Figure 4).
When the daily precipitation amounts larger or equal to 2 mm were compared, the two-sample Kolmogorov-Smirnov test revealed the high possibility that those two samples are from the same population. However, in most land regions, the p values remain quite low, even with high threshold value, such as over 5 mm d -1 . To determine the reason, we checked each histogram for many locations and seasons, and it was found that the low rainfall intensity (e.g., less than 5 mm d -1 ) does not show simple tendency, and it was difficult to model with only the correlation of surrounding rainfall amounts. High-intensity rainfalls of 20 km or 60 km resolution are mostly based on large-scale rainbands or typhoons and can be easily captured in our SDS model. In addition, sophisticated geographic interaction with other atmospheric variables, such as wind, is believed to make the spatial correlation of precipitation unstable in terrestrial areas, and it resulted in lower accuracy in terrestrial areas than oceanic areas.

CONCLUSIONS
A regression model using precipitation data only within a formatted frame, which is 3×3 60-km grids, was tested with the MRI-AGCM3.2H output. The downscaling target was a 60 km resolution daily precipitation data into a 20 km resolution grid for the Asia Monsoon region. To evaluate the downscaling results, the monthly mean precipitation amounts and  Figure 4, with daily rainfall amounts larger or equal to 2 mm daily precipitation frequency were compared to those of the original AGCM20km precipitation output.
Because the downscaled results inherently contain characteristics of the downscaling source, which is the AGCM60km output, the results showed slight differences in the monthly mean precipitation amount and significant differences in the frequency of daily precipitation. However, when the AGCM60km output is corrected to match the AGCM20km output using a daily scaling method, the downscaling results based on the bias-corrected AGCM60km output provide almost identical monthly mean precipitation patterns and also greatly improve daily precipitation frequency.
The proposed SDS method is only based on the spatial correlation of precipitation and simplifies the regression model within a formatted regression frame, and thus, it is easy to utilize for any region with light computing resources. Even if there is noticeable discrepancy in the frequency of daily precipitation amount, especially in land areas, considering the efficiency of the proposed SDS method, the 20 km resolution downscaled results can be a good alternative to ensemble output until we have numerous simulation outputs from the AGCM20km directly. Figure 6. The Kolmogorov-Smirnov test results for February (left) and August (right) as in Figure 4, with all daily rainfall amounts (top) and with daily rainfall amounts larger or equal to 2 mm (bottom). The downscaled results in this test are from the bias-corrected AGCM60km output