Multi-Year Analysis Using the NICAM-LETKF Data Assimilation System

This study investigates the long-term stability of the global atmospheric data assimilation system, incorporating the Local Ensemble Transform Kalman Filter (LETKF) with the Nonhydrostatic ICosahedral Atmospheric Model (NICAM). The NICAMLETKF system assimilates conventional observations, advanced microwave sounding unit–A (AMSU-A) radiances, and global satellite mapping of precipitation (GSMaP) data. The longterm stability of the data assimilation system can be investigated only by running an expensive long-term experiment. This study successfully performed a data assimilation experiment with more than 2 years of data, using the relaxation to prior spread (RTPS) method for covariance inflation. Analysis fields indicate a stable physical performance compared with the ERA-interim data for the entire experimental period. (Citation: Terasaki, K., S. Kotsuki, and T. Miyoshi, 2019: Multi-year analysis using the NICAM-LETKF data assimilation system. SOLA, 15, 41−46, doi:10.2151/sola.2019-009.)


Introduction
Data assimilation plays an essential role in improving the accuracy of numerical weather prediction (NWP) by providing optimal initial conditions based on available observation data. Data assimilation also provides invaluable reanalysis datasets that are widely used in broader geophysical studies. The ensemble Kalman filter (EnKF; Evensen 1994;Houtekamer and Zhang 2016) is an advanced data assimilation method based on the Monte Carlo approach and has become a commonly used tool in data assimilation research. The local ensemble transform Kalman filter (LETKF; Hunt et al. 2007) is an efficient data assimilation method that is closely related to the local ensemble Kalman filter (LEKF; Ott et al. 2004). The computation of the LETKF is completely independent for each model grid point and is thus suitable for massively parallel computation.
Most operational global NWP systems have been developed based on the spectral discretization method in which transformations between grid space and spectral space require the sharing of memory space for parallel computation. Here, the grid-point model is advantageous because it can localize the memory space and greatly reduce global communications between different computational nodes (Yashiro et al. 2016). The non-hydrostatic icosahedral atmospheric model (NICAM; Tomita and Satoh 2004;Satoh et al. 2008Satoh et al. , 2014 has been designed and developed to utilize cutting-edge high-performance computing (HPC) systems efficiently. We have been developing an LETKF system using NICAM, or simply the NICAM-LETKF system, for use in future HPC systems. The NICAM-LETKF system was first developed by Terasaki et al. (2015) and has been continuously extended to assimilate Advanced Microwave Sounding Unit (AMSU)-A radiances (Terasaki and Miyoshi 2017), Global Satellite Mapping of Precipitation (GSMaP; Kubota et al. 2007;Kotsuki et al. 2017a) data, and model parameter estimations (Kotsuki et al. 2018).
The NICAM-LETKF has the following unique characteristics compared with other global atmospheric data assimilation systems: online estimation of the scan bias of AMSU-A radiances, assimilation of GSMaP data, and an accelerated implementation designed for next-generation HPC platforms (Yashiro et al. 2016). Precipitation analysis is also an important product of the NICAM-LETKF.
Long-term stability is an essential property of NWP systems. It is well known that the Kalman filter with nonlinear models sometimes becomes unstable suddenly after a few months of stable performance (Lewis et al. 2006;Snyder 2015;Houtekamer and Mitchell 2005;Kelly et al. 2015). Miyoshi et al. (2007a, b) applied the LETKF to the atmospheric general circulation model for the earth simulator, produced a reanalysis dataset called ALERA assimilating conventional observations, and demonstrated the stability of the LETKF over 18 months. Moreover, Enomoto et al. (2013) successfully performed a long-term experiment with the second generation of ALERA over 6 years. However, Terasaki et al. (2015), Terasaki and Miyoshi (2017) and Kotsuki et al. (2017a) performed NICAM-LETKF experiments for only a few months; no long-term experiment has been performed with the NICAM-LETKF system.
This study aims to assess the analysis accuracy of the current NICAM-LETKF system and to verify its long-term stability for more than 2 years. In addition, the characteristics of the analyzed precipitation data, a unique product of the NICAM-LETKF system made possible by assimilating GSMaP data, are also investigated.
The remainder of this paper is organized as follows. In Section 2, we describe the NICAM-LETKF system. Results of the verification are described in Section 3. Finally, conclusions are presented in Section 4.

NICAM-LETKF system and experimental settings
The NICAM is a global atmospheric model with an icosahedral grid, and this study uses the grid division level 6 (112km horizontal resolution) with 38 vertical levels. The model top is chosen to be 40 km, or about 3 hPa. Numerical diffusion is applied at upper levels at 20 km and higher to avoid numerical instability. The NICAM-LETKF system assimilates AMSU-A radiances, GSMaP data, and a set of conventional observations known as the NCEP (National Centers for Environmental Prediction) PREPBBUFR data, which include radiosondes, wind profilers, aircraft reports, surface pressure, atmospheric motion vectors (AMVs) and surface winds derived from satellite observations (Terasaki et al. 2015;Kotsuki et al. 2017a;Terasaki and Miyoshi 2017). The ensemble size is set to 128. The 128 initial conditions are chosen from the NCEP FNL (Final) operational global analysis data at 0000 UTC in May and June from 2008 to 2012. The data assimilation cycle experiment is performed from 0000 UTC 1 June 2014 to 1800 UTC 31 July 2016.
Covariance inflation plays an important role in EnKF-based data assimilation systems. There are several covariance inflation methods: multiplicative inflation, additive inflation, and relaxationto-prior. The multiplicative method inflates the prior ensemble were obtained from the research data archive (RDA) managed by the Data Support Section of the Computational and Information Systems Laboratory at the National Center for Atmospheric Research (NCAR).
The near-real-time (NRT) product of the GSMaP version 6 precipitation data was assimilated. The horizontal resolution of the original GSMaP data is 0.1° × 0.1° from 60°S to 60°N. The original GSMaP data were aggregated into the NICAM model grid, and thinning of every 5-by-5 grid points was applied to avoid degradation of the analyses related to the observation error correlation (Kotsuki et al. 2017a). To assimilate GSMaP precipitation data, Kotsuki et al. (2017a) applied the forward Gaussian transformation that transforms non-Gaussian probability density function (PDF) of precipitation into the Gaussian PDF.
Covariance localization was applied using Gaussian functions whose standard deviations were chosen to be horizontally 400 km and vertically 0.4 natural logarithm of pressure. The localization functions were replaced by zero beyond 2 10 3 / times the standard deviation. Terasaki and Miyoshi (2017) used the same localization setting with the NICAM-LETKF system at the same resolution and showed a beneficial impact by assimilating the AMSU-A radiances. Tuning horizontal and vertical localization settings may improve the DA system further but is beyond the scope of this paper. Figure 1 shows the root mean squared deviations (RMSD) of the temperature (K) and geopotential height (m) analyses relative to the ERA-interim data (Dee et al. 2011), and the analysis ensem-spread by multiplying it by a value larger than 1. The additive method inflates a prior or a posterior ensemble spread by adding a noise to the ensemble perturbations. The additive noise can be a random draw from the background error covariance (Mitchell and Houtekamer 2000) or from 6-h differences of a long-term reanalysis dataset (Whitaker et al. 2008), or a flow-dependent perturbation from the ensemble-based singular vectors (Yang et al. 2015). The relaxation-to-prior methods relax reductions of the analysis ensemble spread during data assimilation. Whitaker and Hamill (2012) showed that relaxation to prior perturbations (RTPP) outperformed fixed multiplicative inflation when both methods use optimal parameter values but caused large analysis errors when the relaxation parameter varied from the optimal value. However, the physical performance and stability of data assimilation using the relaxation-to-prior-spread (RTPS; Whitaker and Hamill 2012) method was less sensitive to the value of the relaxation parameter than that using the RTPP. Kotsuki et al. (2017b) applied the RTPP and RTPS to the NICAM-LETKF system and showed that both methods improved the analysis accuracy compared with the adaptive multiplicative inflation method of Miyoshi (2011). The present study focused on the long-term stability of the NICAM-LETKF system; therefore, we choose the RTPS with a fixed relaxation parameter (α = 0.95) that was manually tuned by Kotsuki et al. (2017b):

Results and discussions
where σ b and σ a indicate the prior and posterior ensemble spreads, and σ a new is the analysis ensemble spread relaxed back to the prior ensemble spread.
Conventional observations, AMSU-A radiances, and surface precipitation data are assimilated in this study. For the conventional observations and the AMSU-A radiances, we followed the experimental settings of Terasaki and Miyoshi (2017). The thinning distance of the AMSU-A radiance data was set to 250 km. The AMSU-A channels assimilated for each satellite in this study are listed in Table 1. Considering the vertical resolution of the NICAM and the numerical diffusion applied at 20 km and higher, channels from 6 to 8 were assimilated. The scan bias and airmass bias of the AMSU-A data were estimated and corrected adaptively at each data assimilation step (Terasaki and Miyoshi 2017). Conventional observations and AMSU-A radiance data  ble spread for temperature at 500 hPa averaged over the globe, the Northern Hemisphere (20°N−90°N), the tropics (20°S−20°N), and the Southern Hemisphere (90°S−20°S). The time-series of ensemble spread shows small variations throughout the experimental period of more than 2 years. The RMSD shows seasonal cycles, as larger RMSDs are observed in winter hemispheres (Figs. 1b and 1d). By contrast, the ensemble spread shows almost no seasonal variation. This result is consistent with Miyoshi et al. (2007b) who used a different global model. To include the seasonal variation in the ensemble spread, we may need to apply the adaptive algorithm for covariance inflation. Figure 2 shows the time-series of estimated scan biases based on observation-minus-background (o-b) departures before the bias correction at nadir for the assimilated AMSU-A radiances of four satellites . Some channels show a decreasing trend of the scan bias estimation in 2014, suggesting that it take more than one year for the spin-up. This may arise from a small sample size because the scan bias is independently estimated for every field-of-view (FOV). The scan biases show smooth variations in time after January 2015. By monitoring the variation of the estimated scan bias, we could identify possible degradations of instruments, as indicated by periods of rapid change. For example, channel 8 of METOP-2 shows a rapid change at the end of September in 2015; this channel has not provided data since 30 September 2015. Channels 7 and 8 of NOAA-15 also have shown jaggy fluctuations since 2015. Figure 3 shows the annual-mean biases and standard deviations of the bias-corrected o-b departures from AMSU-A radiances in 2015. Here, the bias correction for both the scan and airmass biases are applied. The absolute values of the biases of channels 6 and 8 for all satellites are smaller than 0.05 K, suggesting that the biases were removed effectively. In particular, the biases of channels 6 and 8 for METOP-2 are very close to zero (−0.007 and −0.005 K). However, the bias of channel 7 of NOAA-18 is larger (−0.078 K) than the others. The standard deviations of the o-b departures tend to increase with channel number. The channel 8 of the AMSU-A is sensitive to the levels between the upper troposphere and the lower stratosphere. Numerical diffusion is applied to 20 km and above in the current NICAM and this may contribute to the larger standard deviations in channel 8. Overall, results show that the bias of AMSU-A radiances is accurately removed by the bias correction procedure introduced by Terasaki and Miyoshi (2017). Figure 4 shows the time-mean RMSDs and biases of temperature, zonal and meridional winds, and specific humidity of the analysis and 6-h forecast verified with radiosonde observations averaged over the whole experimental period. The analysis RMSDs of temperature, and zonal and meridional winds are smaller than the 6-h forecast RMSDs at all vertical levels. For humidity, the analysis RMSDs are generally smaller than the 6-h forecast RMSDs over the globe at all vertical levels, except in the lower troposphere in the tropics. The RMSD of the temperature analysis is ~1 K from the middle to upper troposphere, but larger RMSDs are seen in the lower troposphere, particularly in the tropics. Figure 4e shows that NICAM has a cold bias in the lower troposphere, particularly in the tropics. Wind biases are almost Fig. 2. Time-series of online estimated scan bias (K) at nadir for AMSU-A radiances. Red, blue, green, and black colors indicate the NOAA-15, -18, -19, and METOP-2 satellites, respectively. Solid, dotted, and dashed lines indicate channels 6, 7, and 8, respectively. The period in 2014 is considered as a spin-up and is grayed out. Channel 8 of METOP-2 has not provided data since 30 September 2015.  and 6-h forecast (solid lines) relative to radiosonde observations of (a, e) temperature (K), (b, f) zonal wind (m/s), (c, g) meridional wind (m/s), and (d, h) specific humidity (g/kg). Black, red, blue, and green colors indicate averages over the globe, Northern Hemisphere, tropics, and Southern Hemisphere, respectively. zero in the troposphere, except for zonal winds at the 1000-hPa level in the tropics (Figs. 4f and 4g; 30,627 and 30,662 samples for the 6-h forecast and analysis, respectively). Improving the parameterization of the planetary boundary layer may mitigate the wind biases near the surface. Specific humidity biases show that NICAM has a dry bias in the troposphere, particularly in the tropics (Fig. 4h). Assimilating the GSMaP precipitation data intensifies the dry bias in the tropics and the Northern Hemisphere in contrast to our expectation. If we assimilated all GSMaP observation, the o-b departures of the Gaussian transformed precipitation would be statistically zero. However, we cannot assimilate GSMaP precipitation when none of the ensemble members forecasts precipitation (i.e., zero ensemble spread). In contrast, the NICAM-LETKF assimilates GSMaP precipitation in clear sky when some ensemble members forecast precipitation. Because of this unavoidable limitation, the mean o-b departure is slightly negative (c.f., Fig. 4b of Kotsuki et al. 2017a). Consequently, the assimilation of GSMaP precipitation would intensify the dry bias in the lower troposphere. The RMSD of the specific humidity is large in the middle troposphere in the tropics because of a large dry bias. Applying the estimation of a model parameter related to large scale condensation (Kotsuki et al. 2018) may help to mitigate the dry bias. Kotsuki et al. (2017a) developed forward and inverse Gaussian transformation methods based on the cumulative distribution function (CDF) of precipitation data for the past 30 days. The Gaussian transformations enable us to obtain two analysis precipitation fields based on the CDFs of NICAM and GSMaP precipitation. Figures 5a, 5b, 5c, 5d, 5e, 5f, and 5g shows the spatial distributions of annual mean precipitation. The precipitation analysis fields are able to capture the spatial distribution and precipitation intensity of the GSMaP. However, we found remarkably less precipitation in the superobbed GSMaP and NICAM-LETKF precipitation analysis in both Hemispheres (40°N to 60°N and 40°S to 60°S). This decrease in precipitation in the high latitudes is also seen in the original GSMaP NRT data before the superobservation (Fig. 5f). Figure 6 compares the zonally averaged precipitation analysis with JRA-55 (6-h forecast; Kobayashi . The precipitation analysis can be obtained between 60°S and 60°N where the GSMaP data are available. It is also evident that the analysis precipitation transformed using the observation-based CDF produced weaker precipitation at high latitudes than ERA Interim, GPCP, and the original NICAM. The weaker precipitation of GSMaP NRT at high latitudes may arise because the current algorithm of GSMaP NRT (version 6) does not estimate solid precipitation (JAXA 2017). Recently, JAXA has started to estimate solid precipitation based on passive microwave imager data, which have been implemented into the GSMaP algorithm (Kubota T., pers. comm.). Further improvements in GSMaP will help to mitigate biased precipitation analyses by NICAM-LETKF.

Summary
This study aimed to assess the long-term stability of the NICAM-LETKF system. A NICAM-LETKF experiment was performed for more than 2 years from 0000 UTC 1 June 2014 to 1800 UTC 31 July 2016. Conventional observation, AMSU-A, and precipitation (GSMaP) data were assimilated. The RTPS with a fixed relaxation parameter (α = 0.95) was applied as covariance inflation. Refining the choice of the covariance inflation method may help to improve the analysis, e.g., a combination of RTPS and additive inflation, but this is a subject of our future research. The difference between the analysis ensemble mean of the NICAM-LETKF and ERA-interim indicates stable performance of the system. The time-series of RMSD of temperature and geopotential height show large seasonal variations, particularly in the Northern Hemisphere. By contrast, the time-series of the ensemble spread indicates small variations throughout the experimental period. The ensemble spread of NICAM-LETKF may fail to accurately capture the error structure.
In this study, precipitation analysis fields were produced by assimilating GSMaP data. The precipitation analysis captured the GSMaP horizontal distribution and intensity. However, we found a remarkably weak precipitation bias in both Hemispheres in the precipitation analysis fields, which may be attributable to biased precipitation in the GSMaP NRT data at high latitudes. Therefore, it may be beneficial not to assimilate GSMaP data at high latitudes. The horizontal resolution of NICAM was 112 km, much coarser than that of the GSMaP data (i.e., every 0.1°). Increasing the spatial resolution of NICAM would result in more realistic precipitation fields. The current NICAM-LETKF system assimilates GSMaP data every 5 × 5 model grid points to avoid problems with assimilating error-correlated GSMaP data. Future work should focus on the development of data assimilation systems that account for horizontal observation error correlations to effectively draw more information from error-correlated observations.