Statistical modeling of global mean wave height considering principal component analysis of sea level pressures and its application to future wave height projection

Future wave climate projection is important for climate impact assessment of the coastal hazards and environment. In this study, monthly averaged wave heights are estimated by a linear multi-regression model using atmospheric data as explanatory variables. The present statistical model considers local atmospheric information (wind speed at 10 m height, sea level pressure) and large scale atmospheric information obtained from principal component analysis (PCA) of the global sea level pressure field. The representation of swell in the lower latitude is greatly improved by introducing the large scale atmospheric information from the PCA. The present statistical model was applied to the results of the Japan Meteorological Research Institute’s Atmospheric General/Global Circulation Model (MRI-AGCM) climate change projection. The future change of wave heights shows an increase in the northern North Pacific Ocean and a decrease in the North Atlantic Ocean, middle latitude and tropics of the Pacific Ocean.


INTRODUCTION
Future warmer climate is highly expected to have a significant impact on our society, and the assessment of the resultant climate change impact is important for adaptation strategy. Future climate change impacts on the coastal environment and hazards have been assessed for sea level rise (e.g., Hallegatte et al., 2013). Global Sea Level Rise (SLR) rate in the 21st century will very likely exceed the rate observed in the late 20th century and a rapid increase of SLR is projected until the end of the 21st century (Fifth Assessment Report by Intergovernmental Panel on Climate Change; so-called IPCC-AR5; IPCC, 2013). In addition to SLR, ocean surface waves are other components to be assessed for coastal environment and hazard in the future (e.g., Mori et al., 2016). Barnard et al. (2015) found that Pacific coastal erosions are significantly associated with the global climate variability pattern through wave climate variability, and thus suggested that wave climate variability and change can be used to predict future coastal hazards and environment. Although extreme wave condition is important for coastal hazard problems, mean wave climate is also important for beach stability, port operations and coastal environments.
The future projections of global wave climate under global warming scenarios have been carried out and the decrease or increase in wave heights differed depending on the oceans until the end of this century (e.g., Hemer et al., 2013). Global Circulation Model (GCM) generally does not consider ocean waves; therefore, an independent wave climate projection based on the GCM results by dynamical or statistical wave model is required for coastal impact assessment. Several research groups have deployed dynamical approaches for global wave climate projection (e.g., Mori et al., 2010;Shimura et al., 2016). The dynamical approach can use wave physics explicitly but is not suitable for the large number of ensemble projections due to the extreme computational loading. On the other hand, the statistical method is quite light for computation and is easily applied to multi-model and multi-scenario climate projections Wang et al., 2014). Projections of future wave climate have large uncertainties due to the low level of agreement between models and the limited number of available projections (IPCC, 2013). Climate change impacts should be investigated using a large number of climate projections of single or multi-model ensemble (Kawase et al., 2016) due to large climate internal variability (Kay et al., 2015). In order to estimate uncertainty of projection in future wave climate, it is important to increase the number of projections forced by different scenarios and climate models. Thus, demand for a statistical wave prediction model has been greatly increasing since IPCC-AR5 (IPCC, 2013).
Sea surface winds generate ocean waves (wind sea waves) and the waves propagate far away from the generating area to outside (as swell waves). Swells can propagate more than 10,000 km; for example, waves generated in the Southern Ocean can reach Alaska (Snodgrass et al., 1966). Global ocean is dominated by swell waves, especially in lower latitudes (Semedo et al., 2011). In this study, we develop a statistical model of global wave heights using principal component analysis (PCA) to implement remotely generated swells. As local and large-scale atmospheric information are required to predict wave heights correctly, the explanatory variables in the statistical wave model consider local atmospheric variables for wind sea waves and large scale atmospheric patterns from PCA for swell waves.
Firstly, this manuscript describes the methodology of statistical modeling. Secondly, the developed method is validated by historical wave analysis data. Finally, the developed statistical wave model is applied to future projection of global wave climate based on the atmospheric data from the GCM, and the future change in wave heights are shown under four different greenhouse gas (GHG) concentration scenarios.

DEVELOPMENT OF STATISTICAL WAVE MODEL
A statistical wave model of global wave height was developed based on atmospheric and wave reanalysis data by the Japan Meteorological Agency, so called JRA-55 (Kobayashi et al., 2015). JRA-55 does not contain wave data. Mori et al. (2015) conducted a 55 years wave climate hindcast using WAVEWATCH III forced by sea surface winds of JRA-55 (denotes JRA-55 wave). Both atmospheric data, sea surface winds and sea level pressure by JRA-55, and the wave hindcast data are used as the teaching data for the statistical wave model. The spatial resolution of JRA-55 and wave analysis is about 60 km and the analysis period is 1958-2012. The data from 1958 to 1987 was used for model development and the remaining period during 1988-2012 was used for model validation. The JRA-55 wave values are regarded as true values. The accuracy of JRA-55 and JRA-55 wave was validated by Kobayashi et al. (2015) and Mori et al. (2015).
The monthly mean significant wave height (H S ) is a target for the statistical modeling and is explained by the multivariate linear regression model. The model is represented as H is predicted H S , x i is the explanatory variables, a and b i are coefficients determined by the least-square method, and k is the total number of explanatory variables. The coefficients are calibrated by teaching data at each grid point. The choices of explanatory variables x i are square monthly mean wind speed at 10 m height ( 2 10 U ), sea level pressure (SLP), horizontal gradient of SLP (ΔSLP) and the PC modes of SLP field. The local atmospheric information, 2 10 U , SLP, and ΔSLP, on the same grid point are used with equal weighting for the prediction of H S , which represent locally generated wind sea.
The modes of principal component (PC) of SLP field were considered to introduce the swell contributions to each grid information considering large scale atmospheric patterns for the statistical modeling. In addition to local atmospheric information, time dependent coefficients of PC modes for SLP field (denoted as PC mode hereafter) are included as explanatory variables following a previous study (Wang et al., 2014). The procedures of calculating PC modes are summarized as follows. Firstly, the covariance matrix of SLP (V) is calculated for each region. Secondly, eigenvalue analysis of the covariance matrix is performed. The eigenvalue equation for V can be expressed as Vz j = λ j z j with the jth largest eigenvalue λ j and the associated jth eigenvector z j . Finally, jth PC mode on given time t (PCj t ) are obtained by projecting an eigenvector onto an SLP vector at given time (SLP t ) as PCj t = SLP t z j . As a result, the linear multivariate regression model combining local atmospheric information and PC modes can be written as where b j ' is the amplitude of modes, j is the jth number of PC mode and n is the total number of PC mode, respectively. The first four terms of the right hand side of Equation (1) refer to local atmospheric information, and the last term is the large scale atmospheric information described by PCA. The differences in prediction accuracy by the number of PC mode included in the model are described in the next section. The computational costs of the statistical wave model are 1/100-1/300 of that of dynamic wave modeling.
Here, two types of statistical models were developed according to the choice of region for calculating PC modes. Firstly, the PC modes were calculated with SLP field over global domain. Secondly, the PC modes were calculated with SLP field in each of the eleven oceans as classified over the globe as shown in Figure 1a. The two different models are denoted as global PC model and regional PC model, respectively. The definition of regions is the same as that of Wang et al. (2014) as shown in Figure 1a. The global PC model can be calculated based on PC modes over the globe systematically but the regional PC model classifies regions arbitrarily. On the other hand, the number of required PC modes for the regional PC model is expected to be smaller than the global PC model. Therefore, the accuracy of the two models is compared and validated with JRA-55 wave reanalysis.
In the regional PC model, the spatial distribution of the 4th mode in the North Atlantic is similar to that of the NAO (North Atlantic Oscillation), and the 4th mode in the North Pacific resembles that of the PDO (Pacific Decadal Oscillation). Therefore, adding the 4th PC mode to the explanatory variables allows models to consider large-scale atmospheric information of NAO and PDO for the North Atlantic and the North Pacific. The relationships between the other PC modes and climate indices depend on the regions but these are related to seasonal through to decadal changes of wave climate. Through the use of PC modes we can implement decadal to seasonal changes of remotely generated swells to grid-based statistical model. However, there is a maximum number of PC modes allowed due to overcomplete/fitting use of components. The validation and optimization of the statistical model will be conducted in the next section.

VALIDATION OF STATISTICAL WAVE MODEL
The developed statistical models were validated using JRA-55 and JRA-55 wave . The monthly mean wave heights were used for estimation and validation. The validation was carried out by comparing JRA-55 wave with the output from the statistical model. Table I shows the validation results of statistical modeling with changing number of PC modes. The correlation coefficients and root-meansquare-errors (RMSE) of wave heights between results predicted by regional PC models and JRA-55 wave are indicated as a function of the number of PC modes. Although the statistical models have an arbitrary choice of number of PC Figure 1. Classification of oceans (a), and normalized regional mean of root-mean-square errors between wave hindcasts and calculated mean wave heights by the statistical models for each region (b-l) ; unit: %) Table I. Global mean correlation coefficients, the value of root-mean-square-errors and the percentage of them between wave hindcasts and calculated mean wave heights by the regional statistical models   The validation of both regional and global PC models is conducted carefully over the eleven regions. Figure 1 shows the definition of classified regions and the percentages of annual mean RMSE for eleven regions. The black line with asterisk and red line with circle indicate the percentage of RMSE as a function of number of modes used for modeling. The results of 0 mode refer to no use of PCA for statistical modeling (i.e., local grid information only). Use of PC mode(s) significantly improves the statistical model especially at lower latitudes near the equator (Figure 1d-g). This result indicates that PC modes can describe remotely generated swell effects if we implement PC modes based on the historical wave analysis. It is reasonable to understand this improvement because monthly ocean waves are dominated by swells in lower latitudes (Semedo et al., 2011). The correlation and RMSE improve as the number of PC modes increases. The regional PC model shows best performance using 30 PC modes and its accuracy decreases if we use more than 30 modes. The accuracy is significantly improved (smaller RSME) in the southern hemisphere. As the influence of swells becomes large for long fetch cases, the number of introduced PC modes is more effective in the southern hemisphere than the northern hemisphere due to continental effects. The errors of the global PC model are monotonically decreased in comparison with the regional PC model. Although the number of PC modes required in the global model is larger than that required in the regional PC model, the global PC model with 50 PC modes shows the best performance for the most oceans and has similar accuracy to the regional PC model with 10 modes.
Although classification of area is arbitrary for the regional PC model, the regional PC model gives better performance than the global PC model. Therefore, the results of regional PC model will be discussed and will be used for future wave climate projection, hereinafter. Figure 2 shows the spatial distributions of the RMSE between the annual mean wave heights by the statistical regional PC model and JRA-55 wave. The improvement by changing the number of modes can be seen clearly at the extra-tropics (20-30 degrees in latitudes) and the eastern part of the ocean where swell existing percentage is larger. The annual mean wave height can be improved by introducing PC of SLP field in the model, compared with using the local atmospheric information only (0 mode, Figure 2a). The errors are less than 6% over a wide area of the globe if we use more than 20 PC modes to model as shown in Figure 2 and Table I. The errors are 6-10% at the extra-tropics and the eastern part of the ocean. On the other hand, no matter how many modes are used, the errors in the low latitude Western North Pacific and very high latitude near the poles are not significantly improved. This is because there are many islands in the low latitude Western North Pacific and the local effects are different from the other regions. The ice effect is not considered in this study, therefore the accuracy of the statistical model is insufficient in the Polar Regions. Except for in these special regions, the developed statistical model has enough accuracy for the future projection of wave climate change.

FUTURE PROJECTION OF GLOBAL WAVE CLIMATE CHANGE
The future change of wave climate is projected by using the regional PC model forced by future surface winds and sea level pressure in the future climate projected by GCM. The wave climate change is projected using different future greenhouse gas emission (GHE) scenario projections by the Japan Meteorological Research Institute's Atmospheric GCM (MRI-AGCM; Mizuta et al., 2012). The climate projections by MRI-AGCM are a time-slice experiment forced by the Representative Concentration Pathways (RCP) 2.6, 4.5, 6.0 and 8.5 scenarios targeting the period 1979-2009 for the present climate and 2075-2100 for the future climate.
The statistical wave models were validated and projected by forcing of MRI-AGCM projections at each grid point in the present and four different future climate conditions. Figure 3 shows the results of validation for the present climate condition. The percentage in the figure means difference between projected monthly mean wave heights by MRI-AGCM and wave hindcast by JRA-55 during the period of 1979-2009. The bias of statistical model is ununiformly distributed and is not more than 6% in most parts of the globe. The large bias is located near the coast or narrow channels due to less accurate wind forcing by the present run of MRI-AGCM. The order of bias is similar to projected changes from the results of the RCP6.0 run which will be discussed later. Figure 4 shows projected annual mean wave height changes from present to future climate condition. The regional statistical model with 30 PC modes in each region was used for future climate projection based on the discussion in section 3. The stippling pattern in Figure 4 denotes areas where correlation coefficients were greater than 0.85. There are positive (increase) and negative (decrease) future changes of annual mean wave height depending on the region. The percentages of covered area with wave height increases are 42.7% by RCP2.6 scenario, and it becomes 44.9%, 43.7% and 43.4% by RCP4.5, RCP6.0 and RCP8.5 scenarios, respectively. Although the covered area with wave height increase, with the exception of the Antarctic Ocean, is largest under RCP2.6 scenario, the covered area of global mean wave height increase is largest under RCP4.5 scenario. However, the contrast of both positive and negative changes in the future climate becomes significant as the GHE scenario becomes severer (from RCP2.6 to 8.5). The signals of future wave height change by RCP8.5 scenario show 2-5% decrease in the North Atlantic, 5-6% decrease in the central North Pacific, 0-4% decrease in the middle latitude, and 5-6% decrease in the northern equator, respectively. The severer GHE scenario shows the larger mean wave height changes in these regions. In the Antarctic, the projected future change of wave heights shows little decrease under the RCP2.6 scenario and increase under the RCP4.5, 6.0 and 8.5 scenarios, although the projected future changes of wave heights in the Arctic monotonically increase as GHE scenario becomes severer. These characteristics of future change depend on the changes of fetch due to extension of ice in the Polar Regions. Thus, the causes of future change signal are different in the high latitude and the other regions. These characteristic changes of wave height by the regional PC model are smaller than the projections by the model with zero PC mode (without swells).
The percentages of covered area with positive wave height change in the future climate are 3.5% smaller than the statistical projection result by Wang et al. (2014), although the forcing is different. Especially, the wave heights in the low latitude (20S-20N) in the eastern Pacific increase by the projection of Wang et al. (2014) but they are expected to decrease in this study. However, the decrease of wave heights in the middle latitude is consistent with their results, except in the central North Atlantic. In addition, the global distribution of wave height changes in this study are quite similar to the dynamic projection by Hemer et al. (2013) in the Pacific. The value of wave height changes in the Pacific under RCP8.5 scenario in Figure 4d is similar to that in Hemer et al. (2013). The increasing wave heights in the western Indian Ocean and low latitude in the Atlantic are also similar to Hemer et al. (2013). The projected mean wave height changes in future climates show increase in the northern North Pacific throughout under all scenarios. The increase in the northern North Pacific is 0-5% despite the scenarios. This change is not seen in either Wang et al. (2014) or Hemer et al. (2013). The reason for this particular change is due to increased wave height in winter as opposed to decreased wave height in summer.
As the future change signals show longitudinal direction patterns, it is important to discuss latitudinal distributions of averaged wave height. Figure 5 shows longitudinal averaged projected changes of annual mean wave height by four GHE scenarios. Figures 5a-d show the global mean, Atlantic Ocean (90°W to 20°E), Indian Ocean (20°E to 120°E), and Pacific Ocean (120°E to 90°W), respectively. Figure 5 clearly shows that the larger wave height change occurs in the severer GHE scenario. There are few differences in wave height change in the low to middle latitude, but differences are seen in the high latitude. The projected changes in the Antarctic Ocean have a positive value under RCP4.5, 6.0 and 8.5 scenarios and a negative value under RCP2.6 scenario in all oceans. The effects of the Antarctic and Arctic ice for wave climate change over the globe are not certain but these areas are hot spots where significant changes in the future climate are expected.

CONCLUSIONS
Future wave climate projection is important for climate impact assessment of coastal environment and hazards. The demand of the statistical wave model whose computational load is small, greatly increases due to a large number of ensembles that are required for uncertainty estimation of future projection. In this study, the statistical wave height projection models were developed based on the JRA-55 reanalysis data. The monthly averaged wave heights were estimated by the linear multi-regression model. The model considers local atmospheric information ( 2 10 U , SLP and ΔSLP) and large scale atmospheric information by applying the PCA to the SLP field. The representation of swell in the lower latitudes is greatly improved by introducing the large scale atmospheric information. The definition of regions and the required number of PC modes in the statistical model were optimized by comparing between the predicted wave heights and JRA-55 wave hindcast data. The global mean correlation coefficients and RMSE by the regional PC model were improved from 0.84 to 0.94 and from 8.68 to 5.72% through using PC modes. The developed statistical model was applied to the results of MRI-AGCM3.2H climate change projection. The future changes in the mean wave heights from the end of 20th century to 21st century were projected under four different scenarios (RCP2.6, 4.5, 6.0, 8.5). The future changes show increase in the northern North Pacific Ocean and decrease in the North Atlantic Ocean, and middle latitude and tropics of the Pacific Ocean.