2021 Volume 99 Issue 4 Pages 1045-1070
The assimilation of cloudy and rainy microwave observations is under investigation at Météo-France with a method called “1D-Bay+3D/4D-Var”. This method comprises two steps: (i) a Bayesian inversion of microwave observations and (ii) the assimilation of the retrieved relative humidity profiles in a 3D/4D-Var framework. In this paper, two estimators for the Bayesian inversion are used: either a weighted average (WA) or the maximum likelihood (ML) of a kernel density function. Sensitivity studies over the first step of the method are conducted for different degrees of freedom: the observation error, the channel selection and the scattering properties of frozen hydrometeors in the observation operator. Observations over a 2 month period of the Global Precipitation Measurement (GPM) Microwave Imager (GMI) onboard the Global Precipitation Measurement Core Observatory satellite and forecasts of the convective scale model Application of Research to Operations at Mesoscale (AROME) have been chosen to conduct these studies. Two different meteorological situations are analyzed: those predicted cloudy in AROME but clear in the observations and, those predicted clear in AROME but cloudy in the observations. The main conclusions are as follows: First, low observational errors tend to be associated with the profiles with the highest consistency with the observations. Second, the validity of the retrieved profiles varies vertically with the set of channels used. Third, the radiative properties used in the radiative transfer simulations have a strong influence on the retrieved atmospheric profiles. Finally, the ML estimator has the advantage of being independent of the observation error but is less constrained than the WA estimator when few frequencies are considered. Although the presented sensitivities have been conducted to incorporate the scheme in a data assimilation system, the results may be generalized for geophysical retrieval purposes.
In the mid-1990s, satellite observations began to be assimilated into Numerical Weather Prediction (NWP) models using radiative transfer codes. However, only clear-sky observations could be processed because of the nonlinearity of the required observation operators, the nonnormality and large variances of representativeness retrievals and other issues listed in Errico et al. (2007). The next step has been to assimilate information from cloudy and rainy microwave radiances. Early studies focused on the assimilation of satellite-derived rainfall rates and showed significant improvements in weather forecasting (Treadon et al. 2002; Marécal and Mahfouf 2002, 2003; Hou et al. 2004; Mahfouf et al. 2005; Hou and Zhang 2007). These methods required different algorithms to convert satellite radiances in rain amounts for each instrument and were therefore abandoned in favor of a gradual transition toward the use of raw brightness temperatures. Then, in 2005, the European Centre for Medium-Range Weather Forecasts (ECMWF) started to assimilate both cloudy and rainy microwave radiances of the Special Sensor Microwave/Imager instrument thanks to a 1D+4D-Var method (Bauer et al. 2006a, b; Geer et al. 2010, 2017). Such an approach required the development of fast radiative transfer modeling with scattering processes, together with their linearized versions, for variational data assimilation. In 2009 this methodology was replaced by a direct assimilation method accounting for model and representativeness errors and also led to better results in their 4D-Var system (Bauer et al. 2010; Geer and Bauer 2010, 2011).
Bayesian inversions have a long history of being used for extracting information on atmospheric hydrometeors from microwave observations (e.g., Kummerow et al. 1996). Building on this heritage, a method named “1D-Bay+3D-Var” to assimilate cloudy and rainy microwave observations, was first used at Météo-France with ground-based weather radar reflectivities (Caumont et al. 2010; Wattrelot et al. 2014). This method seeks to retrieve an optimal relative humidity profile, from an observation and short-term forecast profiles in its vicinity, thanks to a Bayesian inversion. Then, the retrieved profiles of relative humidity are assimilated into a 3D-Var system. This variable was preferred over the assimilation of the specific humidity and temperature profiles to directly impact the saturation of the atmosphere and modulate cloudiness in the forecast, as well as to facilitate the specification of observation errors. Among the advantages of this method with respect to direct assimilation of cloudy and rainy radiances, which is now more commonly used in the NWP context (Geer et al. 2017), one can mention that (i) it does not rely on tangent linear and adjoint versions of the observation operator, and (ii) it does not suffer from the zero-gradient problem and can generate a cloud where no cloud is present in the first guess by saturating the atmosphere. A potential drawback of the method is that the retrieval to be assimilated can be correlated with the first guess but to a smaller extent than with a 1D-Var+4D-Var (Geer et al. 2008), thanks to a mitigation strategy further explained in Section 3. The Japan Meteorological Agency (JMA) successfully implemented a similar method, with a 4D-Var system, to operationally assimilate observations from spaceborne radars (Ikuta and Honda 2011; Ikuta 2016). More recently, studies have been undertaken at Météo-France with the use of the 1D-Bay+4D-Var method with satellite radiances from the Sondeur Atmosphérique du Profil d'Humidité Intertropicale par Radiométrie (SAPHIR) instrument onboard the Megha-Tropiques satellite (Roca et al. 2020). This water vapor sounder has six frequencies all centered around the 183 GHz and is currently assimilated in the Action de Recherche Petite Echelle Grande Echelle (ARPEGE) global operational NWP model at Météo-France (Courtier et al. 1991) only in clear-sky areas (Chambon et al. 2015). Preliminary results show that cloudy and rainy SAPHIR observations improve wind, temperature, and relative humidity forecasts in the ARPEGE model (Duruisseau et al. 2019). These improvements result from the assimilation of retrieved relative humidities at four different pressure levels. These encouraging studies lead to a desire of extending the method to all microwave instruments. Since the 183 GHz frequency only probes solid hydrometeors, the use of a wider range of microwave channels will allow us to observe liquid hydrometeors as well. The present paper therefore focuses on generalizing the results of Duruisseau et al. (2019) to a larger set of frequencies, as well as investigating various sensitivities of the Bayesian inversion.
In February 2014, the Global Precipitation Measurement Core (GPM-Core) Observatory satellite mission was launched to provide a new standard for nearly global measurements of liquid and solid precipitation from the Tropics to the Midlatitudes (Hou et al. 2014). To that end, it was equipped with two instruments: the Dual-frequency Precipitation Radar (DPR) with two frequencies (13.6 GHz and 35.55 GHz) and the GMI radiometer. The latter is a microwave radiometer with a wide range of frequencies sounding water vapor as well as solid and liquid hydrometeors. The ECMWF started to directly assimilate GMI clear and cloudy radiances into the 4D-Var system in 2017 (Lean et al. 2017; Geer et al. 2017). The Global Modeling and Assimilation Office of the National Aeronautics and Space Administration started the all-sky assimilation as well in July 2018 (Kim et al. 2020). Météo-France assimilates GMI radiances as well but in clear-sky conditions only.
Because of its frequency diversity, the GMI instrument is well suited to study the extension of the “1D-Bay+3D/4D-Var” method to all microwave instruments and complement the preliminary studies conducted with SAPHIR. The present paper describes sensitivity studies regarding the first step of this method, the Bayesian inversion, with the GMI instrument. This inversion involves several parameters for which the impact of their prescribed values must be evaluated to improve the method.
The paper is organized as follows: Section 2 describes the NWP data and the GMI observations selected to perform the sensitivity studies, and Section 3 presents the Bayesian inversion method and the metrics used to evaluate its retrieved results. Section 4 shows the sensitivity results of the inversion on three degrees of freedom: (i) the specification of the observation error, (ii) the channel selection, and (iii) the specification of radiative transfer scattering properties within the observation operator. Finally, Section 5 provides several conclusions from the sensitivity studies with recommendations on future activities.
Level 1B products of the GMI instrument (GPM GMI L1B ATBD 2016) have been chosen for conducting the sensitivity studies to be described hereafter. This radiometer is onboard the Low Earth Orbiting satellite GPM-Core Observatory, which is orbiting around the Earth at an altitude of 407 km on an inclined orbit of 65° with respect to the Equator. The GMI instrument is characterized by a comprehensive set of channels, summarized in Table 1, from 10.65 to 183.31 ± 7 GHz. The dataset of interest spans a 2 month period from September 1st to October 31st, 2017. From this dataset, only a geographical domain over the North Atlantic Ocean is selected, corresponding to the domain of the NWP limited area model described in the next section.
The channels have different footprint sizes that have been remapped on a common grid to simplify the inversion. Hence, the raw Level 1B data have been superobbed onto a regular latitude/longitude grid at 0.1° resolution, compatible with the effective resolution of the NWP model used in the present study.
2.2 Numerical Weather Prediction (NWP) modelAt Météo-France, the convective-scale NWP model Application of Research to Operations at Mesoscale (AROME) is used operationally over numerous geographical domains with forecasts ranging up to 48 h (Seity et al. 2011). This model has a nonhydrostatic spectral dynamical core, explicitly resolving deep moist convection; and is characterized by a finite difference representation on the vertical with discretization on 90 levels from the surface to 10 hPa. In the midlatitudes, Météo-France operates AROME over Western Europe at 1.3 km horizontal resolution; this version is initialized with a 3D-Var data assimilation system (Brousseau et al. 2016) and forced on its lateral boundaries with forecasts from the Météo-France global model ARPEGE. In the Tropics, Météo-France operates AROME at 2.5 km horizontal resolution over five geographical domains covering several French Overseas territories. This tropical version, noted afterward AROME-OM, does not have a dedicated data assimilation system in operations yet. Instead, the AROME-OM versions are initialized with ECMWF analyses every 6 h. The full description of the AROME-OM system can be found in Faure et al. (2020).
In this paper, the AROME-Antilles version, one of the AROME-OM models has been chosen, it covers the geographical domain [10.4°N to 22.45°N; 67.8°W to 52.2°W]. Although the horizontal resolution of computation is on a 2.5 km grid, numerical models typically have an effective resolution 3–4 times larger than the original grid (Ricard et al. 2013). This leads to a typical effective resolution of 10 km, which roughly corresponds to the chosen 0.1° superobbing resolution for GMI observations. The selected GMI superobbed observations are collocated with a 3 h range AROME forecast, using a ±90 min time tolerance, to build up a 3 h 3D-Var for AROME in a future study. Note that the results presented below are robust of a change of the first guess forecast range (not shown).
2.3 Observation operatorThe observation operator used in the NWP model AROME to simulate brightness temperatures is the Radiative Transfer for TIROS Operational Vertical sounder code (RTTOV, Saunders et al. 2018). Within RTTOV, a module named RTTOV-SCATT, considering the scattering effect via hydrometeors on the basis of the Delta-Eddington approximation allows the simulations of observations from microwave sensors within clouds (Bauer et al. 2006c; Geer and Baordo 2014). This study considers version 12 of RTTOV-SCATT to simulate GMI's Brightness Temperatures. Table 2 lists the radiative properties necessary to simulate scattering effects from hydrometeors.
Regarding the radiative properties of snowfall, Geer and Baordo (2014) showed that within the context of the ECMWF model, the best compromise across the different microwave frequencies affected by scattering was to choose the particle size distribution proposed by Field et al. (2007) together with the single-scattering properties of a sector snowflake. This configuration is selected to be the baseline for the present study. The other particle shapes from the scattering database of Liu (2008) are also considered in the sensitivity studies described below.
The first step of the “1D-Bay+4D-Var”, based on the ‘Bayes’ theorem, performs a one-dimensional retrieval of the atmospheric state, based on observations and prior information from a short-term forecast named First-Guess (FG). The retrieved state includes hydrometeors as well as humidity and temperature profiles.
Considering a set of observations yo (GMI radiances), one wants to compute the probability to estimate the true state of the atmosphere: P (x = xtrue | y = yo). The xtrue vector is the “real” atmospheric profile. Thanks to the Bayes theorem, this probability can be rewritten as follows:
![]() |
![]() |
To estimate xtrue, one must define a database of atmospheric profiles for which P (x = xtrue | y = yo) values can be computed. The database used in the Bayesian inversion is made of FG profiles within the vicinity of a given observation. To limit error correlations between the retrieved profile and the first guess, which could be problematic for assimilation (Errico et al. 2007; Geer at al. 2008), the model profile at the observation location is removed from the inversion database. It is assumed that these profiles have high P (x = xtrue) values because they are consistent in terms of weather regime with the meteorological situation of interest. Additionally, the occurrence and intensity of the cloud and precipitation profiles are assumed to be similar in the FG database and in nature. Therefore, the probability P (x = xtrue) can be represented by the relative number of occurrences of profiles xi in the database, i.e., by 1/n, where n is the size of the selected database. Several mathematical estimators of xtrue can be considered. In this study, we focus on two of them: either by selecting the x atmospheric state, which is associated with the highest value of P (x = xtrue | y = yo). Such an estimator is used at JMA (Ikuta and Honda 2011) and was used as well at Météo-France with airborne cloud radar observations and provided good results (Borderies et al. 2018). This estimator is known as the maximum likelihood estimator and is noted hereafter as ML.
Another estimator is the expected value (mean) using the full database as a source of information (Caumont et al. 2010), leading to the following expression: xret = ∫ xP (x = xtrue | y = yo)dx, where xret is the retrieved atmospheric profile from which we then extract the relative humidity profile. In that case, the mean estimator is a weighted average (WA) of the FG profiles available in the database:
![]() |
![]() |
The mathematical expressions of the probabilities in the Bayesian inversion include several tunable parameters: (i) the specification of observation errors, (ii) the channel selection, (iii) the specification of radiative properties within the observation operator and (iv) the definition of the database. Additionally, for each parameter, the choice of the estimator (WA or ML) may lead to a different result as well.
The database for each inversion consists of 625 atmospheric columns within a 250 × 250 km2 square surrounding each observation (25 × 25 profiles); the atmospheric columns are taken from 3 h forecasts of the AROME-Antilles model. Enlarging the database or changing the forecast range does not significantly modify the retrieval results (not shown); thus, the sensitivity experiments hereafter focus only on the first three parameters.
Table 3 summarizes the range of values for the three different degrees of freedom of interest. The experiments have been designed to change a single parameter at a time from a reference configuration and examine the impact of each of them. The set up of the reference configuration has been chosen to be close to the GMI noise equivalent delta temperature (NeΔT) and as well as to be consistent with previous studies for the observation error specification of the Bayesian inversion (Guerbette et al. 2016; Duruisseau et al. 2019). A single value of observation error was selected for all channels for simplicity, and to limit the numbers of sensitivity experiments. However, optimal observation errors would likely vary from one channel to another and could be derived with dedicated diagnostics (e.g., Desroziers et al. 2005). Desroziers et al. 2005 first diagnostics performed (not shown) indicate that observation errors explored for the sensitivity study range between 1 K and 20 K (see Table 3). The setup is also based on the willingness to use the full capability of the GMI sensor for the channel selection, and on the literature for the specification of the radiative transfer model (Geer and Baordo 2014).
Note that the 10 GHz channels have been excluded from the study because the 0.1° superobbing resolution is not consistent with its coarser resolution (20 × 30 km). Furthermore, radiative transfer modeling of this particular frequency is known to suffer from deficiencies (Lean et al. 2017).
3.3 Metrics used for evaluating the inversionsThe Bayesian inversion described before is designed to assimilate microwave radiances in clouds and precipitating systems. To identify these regions, a scattering index based on the normalized difference of brightness temperatures (Bt) between the two polarizations of the 37 GHz frequency is chosen (Petty and Katsaros 1990; Petty 1994; Geer and Bauer 2010). Its expression is given by the following:
![]() |
The results are compared in two different spaces: the model space and the observation space. Hence, two different metrics are defined: The first one compares the brightness temperatures simulated with the retrieved profiles against GMI observations. The differences in brightness temperatures are examined in terms of the correlation coefficient, bias, and standard deviation of their distributions. Note that when the same channels are used within the inversion and in the comparisons, the derived statistics should not be seen as an independent validation. This evaluation can be considered as a sanity check similar to the examination of analysis departures in a variational context that should be reduced with respect to background departures.
The second one compares the retrieved relative humidity profiles to the FG relative humidity profiles. The differences in relative humidities are examined in terms of the mean and standard deviation of their distributions.
To correctly interpret the results of the Bayesian inversion, the diagnostics between the simulated FG brightness temperatures and the GMI observations are calculated over the 2 month period for the FGcloud Oclear scenes [Fig. 1(i)] and the FGclear Ocloud scenes [Fig. 1(ii)]. Figure 1a shows the correlation coefficients between the simulated FG brightness temperatures and the GMI observations. These correlations range from 0 to 0.4 for the FGcloud Oclear scenes. The 23.8 V and 183.31 ± 7 GHz channels have the highest values, whereas the 36.64 V and 89H GHz channels have the lowest coefficients. For FGclear Ocloud scenes, the coefficients never exceed 0.6. The 23.8 V GHz channel has the highest value, and the 89H GHz channel has the lowest one. This may be explained by the sensitivity of the 23.8 GHz and 89 GHz frequencies to water vapour besides cloud, precipitation, and the surface.
(a) Correlation coefficients, (b) bias, and (c) standard deviation distributions of GMI brightness temperature differences between the observations and the FG, uncorrected of the clear-sky bias. Column (i) represents the results for experiment FGcloud Oclear and column (ii), those for FGclear Ocloud . Stars indicate results that are not significant at 95 % level. Statistics computed over the 2 month period from September to October 2017.
The bias of the distribution for the scenes FGcloud Oclear is positive for the low frequencies and negative for the high frequencies and ranges from −18 K to 50 K. For scenes FGclear Ocloud, the bias is negative for the high frequencies and positive for the low frequencies and ranges from −45 K to 23 K. The sign of the bias changing between low and high frequencies is the result of the scattering effects, which becomes stronger than the absorption/emission effects as the frequencies increase and are therefore in agreement with the scenes studied. For both scenes, the biases of frequencies with horizontal polarization are degraded with respect to the vertical polarization. This problem is likely linked to the difficulty to simulate surface emissivity and the impossibility to simulate oriented particle emissivities for both polarizations within the RTTOV radiative transfer code.
The standard deviation for scenes FGcloud Oclear ranges from 7 K to 30 K and is generally higher for horizontally polarized channels. For scenes FGclear Ocloudy the same behavior is observed with a standard deviation ranging between 5 K and 30 K.
The following section examines the results of sensitivity studies on observation errors, channel selection, and radiative properties used within the observation operator.
4.2 Sensitivity to observation error a. Case studyThe GMI instrument observed Hurricane Maria in the Atlantic Ocean on September 18th, 2017. Figure 2 shows the weights of several Bayesian inversions for a cloudy observation located in the core of this Hurricane and represented by the cross in Fig. 2f. Because of the errors in the prediction of its track and its structure by AROME, the model state at the same location corresponds to an atmosphere with much less clouds and precipitation. The weights of a neighborhood defined by 625 profiles in a 250 × 250 km2 box have been calculated and represented by black boxes for different values of the observation error σ: 1, 2, 5, 10 and 20 K, respectively, in Figs. 2a–e. In this figure, the black boxes are filled according to the magnitude of their normalized weight. It appears that a large weight is given to a single profile for the 1, 2 and 5 K observation errors (Fig. 2a to Fig. 2c), but when larger errors are considered, other profiles have a significant weight (Fig. 2d to Fig. 2e).
(a), (b), (c), (d), and (e): FG brightness temperature simulations of the Maria hurricane reaching the West Indies on September 18th, 2017 for the 18 H GHz frequency. Inversion weights are represented as the filling fraction of the black boxes for an observation error that is equal to 1, 2, 5, 10 and 20 K, respectively. (f): GMI observations of the Hurricane with the 18.7 H GHz frequency. The selected observations are represented by the black cross. In this figure, the black boxes are filled according to the magnitude of their normalized weight.
For this particular case, the differences between the two estimators WA and ML are illustrated by the cumulative probability distribution, plotted as a function of relative humidity (Fig. 3). The distributions are shown for a single level at 713 hPa (corresponding to AROME level 51). The impact of specifying a larger observation error can be noticed: the larger the observation error is, the more skewed the distribution is, toward the dryer values of relative humidity. The change in this distribution does not affect the ML estimator, which always leads to a saturated atmosphere. But this affects the WA estimator, which decreases toward less saturated amounts because of the long-tailed distribution of probabilities.
(a), (b), (c), (d), and (e): Cumulated histograms of the weights from the Bayesian inversion for the case study by bins of relative humidity of 5 % at 713 hPa for an observation error of respectively 1, 2, 5, 10 and 20 K. The gray, blue and red arrows respectively indicate the values of the FG relative humidity and the WA and the ML retrieved relative humidity. (a–c) the blue and red arrows are identical.
Figures 4 and 5 show the retrieved atmospheric profiles with the WA and ML estimators, respectively, for 1 K and 20 K errors. In both methods and for both errors, a low precipitating cloud layer is added to the first guess profile and the upper layer cloud is thickened with increased falling snow. Consequently, the brightness temperatures simulated with the retrieved profiles are much closer to those observed in both cases. Similar to relative humidity, the other atmospheric profiles are almost identical for the 1 K error, but strongly differ for the 20 K error. Particularly, the retrieved cloud fraction for 20 K slowly decreases from 1 to 0 from 100 hPa to 1000 hPa. This retrieved cloud fraction profile is not representative of AROME cloud fraction climatology, which simulates frequently sharp binary profiles.
(a), (b), (c), (d), (e), and (f): Case study atmospheric profiles of the relative humidity, cloud fraction, rain, snow, cloud liquid water and cloud ice water for the FG in gray; the WA retrieval in blue; and the ML retrieval in dotted red with an observation error that is equal to 1 K. (g): GMI brightness temperatures are represented by a black square for the observations, by a gray circle for the FG, by a blue triangle for the WA retrievals, and by a red triangle for the ML retrievals with the same observation error.
Same as in Fig. 4, but with an observation error of 20 K.
In this section, five experiments with the observation errors σ set to 1, 2, 5, 10 and 20 K are intercompared over the 2-month period. Statistics between retrieved and observed brightness temperatures for both FGcloud Oclear and FGclear Ocloud scenes are shown Fig. 6.
(a): Correlation coefficient, (b) bias, and (c) standard deviation distributions of GMI brightness temperature differences between the observations and the WA experiments, uncorrected of the clear-sky bias, for observation errors that are equal to 1, 2, 5, 10, and 20 K. Column (i) represents the results for experiment FGcloud Oclear and column (ii), those for FGclear Ocloud. Statistics are computed over the 2 month period from September to October 2017.
Regarding the correlation coefficients between retrieved and observed brightness temperatures, the FGcloud Oclear scenes lead to correlation values always higher than 0.7 [Fig. 6a(i)]. They are also higher for the smallest observation errors. In the case of 1 K and 2 K, the correlations are above 0.9 for all channels. For the FGclear Ocloud scenes, one can notice that the correlation coefficients are characterized by larger variations across channels compared with the FGcloud Oclear scenes. This makes sense because in the FGcloud Oclear case, no hydrometeor content is observed; thus, in this case, the window channels are directly sensitive to the surface. In the case of FGclear Ocloud, the expected retrieval is cloudy and precipitating and the correlations obtained with the different channels reflect the quality of the retrieval either for the liquid or the ice phase. Overall, the correlations are always smaller for the low frequencies than for the higher ones. This indicates that the retrievals are of lower quality for liquid precipitation than for solid precipitation. This may be a result of the physics of the model, which may have difficulty in propagating realistic information about the liquid hydrometeor content when model profiles contain solid hydrometeors. Regarding the variations of the correlation with the observation error specification, the conclusions are similar to the FGcloud Oclear scenes: Larger values are found with smaller observation errors. Overall, the correlation coefficients are significantly improved compared with the correlation coefficients of the FG (Fig. 1).
Regarding the bias of retrieved brightness temperatures, the values are rather small for the FGcloud Oclear scenes [Fig. 6b(i)], typically below 2 K. This means that the retrieval technique is well constrained for those scenes and does not create large biases. By contrast, for the FGclear Ocloud biases, Fig. 6b(ii) shows that the bias can reach −40 K for σ = 20 K. For smaller observation errors, like 1 K or 2 K, the biases are limited to values ranging between 5 K and −10 K. If these results are compared with the initial innovations of Fig. 1, it can be seen that the biases are reduced regardless of the observation error used. The improvements are quite large for FGcloud Oclear scenes since there is almost no bias for each observation error. For FGclear Ocloud scenes, the improvements are less important but still significant.
Regarding the standard deviation of the brightness temperature differences for FGcloud Oclear scenes shown in Fig. 6c(i), it appears that the standard deviation increases with the observation error. The standard deviation never exceeds 5 K with an observation error of 1 K but can exceed 10 K for an observation error of 20 K. For the FGclear Ocloud scenes, the standard deviations have very similar values, except for the high frequencies and the 20 K observation error case.
The differences in results for observation errors can be further explained. Larger observation errors allow the retrievals to remain close to the FG values (i.e., far from the observed ones) and thus to produce a WA of profiles with either too low or too much hydrometeor contents. The vicinity of observations being mostly composed of clear atmospheric profiles (see Table 4), the profile retrieved from the WA will be further influenced by clear profiles. The result will be to retrieve an atmospheric profile with a simulated brightness temperature that is clearer than the observed one. The example in Fig. 5 illustrates such behavior. By contrast, a low observation error will consider atmospheric profiles that are rather close to the observations and the retrieved profile will be less influenced by the difference in number between clear and cloudy scenes in its vicinity.
The retrieved profiles of relative humidity are compared with the FG profiles (analysis increments) in terms of bias and standard deviation statistics and displayed in Fig. 7. The mean values for FGcloud Oclear scenes, Fig. 7a(i), are negative for all observation errors, which means that the assimilation of retrieved profiles would tend to dry the model FG. The mean differences can reach values of up-to −15 % between 600 hPa and 800 hPa for the largest observation errors. For the FGclear Ocloud scenes, the mean values are always positive which means that the assimilation of the retrieved profiles will tend to moisten the atmosphere. The smaller the observation error is, the larger the moistening will be: The 1 K and 2 K curves can reach values of up to 12 % of mean relative humidity FG departure. By contrast, the mean values for the 20 K observation error are close to 0, which means that the assimilation of these retrievals will have very little effect on the analysis. Overall, these results indicate that a small observation error will enhance the FG moistening in case of cloudy observations.
(a) and (b): Mean and standard deviation distributions of the relative humidity differences between the WA experiments and the FG for observation errors that are equal to 1, 2, 5, 10 and 20 K. Column (i) represents the results for the experience FGcloud Oclear and column (ii), those for FGclear Ocloud . Statistics are computed over the 2 month period from September to October 2017.
When examining the standard deviations corresponding to the FGcloud Oclear scenes [Fig. 7b(i)], the curves show almost no dependency with observation error and reach values of up to 22 % at 500 hPa. By contrast, for the FGclear Ocloud scenes, there is a clear decrease in the standard deviations when larger observation errors are prescribed. Indeed, with an observation error of 20 K, the maximum value is 15 %, whereas with a 1 K error, the standard deviation can reach the value of 23 %.
From the above results, several conclusions can be drawn: small observation errors lead to the best fits between Bayesian retrievals and observations. Compared with the FG relative humidity profiles, retrievals with a small observation error either dry (FGcloud Oclear) or moisten (FGclear Ocloud) the atmosphere with a similar magnitude in an average of ±10 % to 15 % between 500 hPa and 900 hPa.
By contrast, larger observation errors tend to smooth the retrievals, with an overall degraded fit to observations.
Differences in statistics between the WA and the ML estimators are not shown because (i) the ML estimator results do not vary with the observation error and (ii) the ML estimator results are almost identical to the WA result with a 1 K error.
4.3 Sensitivity to the channel selection a. Case studyIn this section, sensitivity studies to the channel selection have been undertaken. When considering the same case study as in Section 4.1, the FG simulation (Fig. 8a) is very different from the observations (Fig. 8b). The hurricane structure of the FG appears more scattered with smaller scale structures than in the observations and the spiral bands are not well located. Three different Bayesian inversions with either one GMI frequency (18.7 V or 183.31 ± 7 GHz) or all GMI frequencies are performed, the corresponding simulated brightness temperatures over Hurricane Maria are respectively shown in Figs. 8c–e.
GMI brightness temperatures of Hurricane Maria on September 18th, 2017 (a) FG simulations, (b) observations, (c) WA-retrieved simulations with the use of only the 18.7 V GHz frequency in the inversion, (d) WA-retrieved brightness temperature simulations with the use of only the 183.31 ± 7 GHz frequency in the inversion and (e): WA-retrieved simulations with the use of all GMI frequencies in the inversion, for channel 18.7 V GHz column (i) and for channel 183.31 ± 7 GHz column (ii).
In the first case (Fig. 8c), the Hurricane structure retrieved with the 18.7 V GHz GMI frequency is in good agreement with the observed one. Hence, the retrievals are well constrained for the low-frequency 18.7 V [Fig. 8c(i)], but it is not the case for the 183.31 ± 7 GHz frequency [Fig. 8c(ii)] exhibiting a rather different brightness temperature, structure. The spiral bands are characterized by too warm brightness temperatures whereas a strong underestimation is noticeable at the South of the core of the system.
Regarding the experiment using the 183.31 ± 7 GHz GMI frequency in the inversion, one finds the opposite results (Fig. 8d). The retrieved brightness temperatures have a structure in agreement with the observed ones at 183.31 ± 7 GHz. By contrast, when examining the retrieved structure at 18.7 V GHz, the strong observed contrast between warm brightness temperatures in the cloudy regions and colder ones in clear-sky regions is extremely blurred in the retrieval.
Finally, using all frequencies in the inversion leads to good results when compared with both frequencies (Fig. 8e).
From this case study, the retrievals appear to be well constrained in terms of brightness temperatures for the frequencies used within the Bayesian inversion. However, they lead to rather poor simulations of other frequencies. This means that this retrieval process can hardly infer information over the whole atmospheric profile. Nevertheless, it has been noticed that when using all channels in the inversion the retrieved profiles are better constrained. In the following, we examine if these conclusions are also valid over a larger sample.
b. General resultsA set of 11 experiments, which consists of introducing progressively the high to low frequencies in the Bayesian inversion were performed over the 2 month period.
Statistics between retrieved and observed brightness temperatures for all experiments are shown in Fig. 9. The correlations for the FGclear Ocloud scenes [Fig. 9(ii)] are, like in the previous sections, worse than those for the FGcloud Oclear scenes. Futhermore, they are low for the channels not used in the inversion and high for the other ones. For example, when using only high frequencies, the correlation coefficients are highly degraded for low frequencies with values down to 0.6 for the FGcloud Oclear scenes and down to 0.35 FGclear Ocloud scenes. When adding the low frequencies within the inversion the correlations increase progressively for both scenes. Only the 23.8 GHz frequency coefficients are relatively high for all experiments on Fig. 9a(i), but it was already the case for the initial innovations (Fig. 1a).
(a): Correlation coefficient, (b) bias, and (c) standard deviation distributions of GMI brightness temperature differences between the WA experiments and the observations, uncorrected of the clear-sky bias, adding channels from 183.31 ± 7 GHz to 18.7 V GHz. Column (i) represents the results for experiment FGcloud Oclear and column (ii), those for FGclear Ocloud. Each color corresponds to an experiment with adding channels, and the curves are also thicker with the increase in the number of channels. Statistics are computed over the 2 month period from September to October 2017.
The biases of retrieved brightness temperatures are well constrained for the FGcloud Oclear scenes [Fig. 9b(i)]: below 4 K. For the opposite weather scenes [Fig. 9b(ii)], the biases can reach −35 K for low frequencies when only high frequencies are used in the inversion. Finally, by using all frequencies, all biases are between −5 K and 5 K.
The standard deviations of retrieved brightness temperatures (Fig. 9c) for the FGcloud Oclear scenes display low values with all frequencies and increased ones for low-frequency channels when only high frequencies are selected. For the FGclear Ocloud scenes, the results are similar with larger standard deviation values. The lowest value is 6 K, whereas for the FGcloud Oclear it is only 2 K. Furthermore, the standard deviation of the 18.7 H GHz frequency channel reaches 31 K when only high frequencies are used.
In summary, the channels can be split into two sets having contrasted behaviors: The frequencies below 37 GHz and those above. The best results for the high frequencies are obtained when using only high frequencies in the inversion. When adding low frequencies, correlations for these channels increase at the expense of slightly reduced correlations for high-frequency channels.
If one now considers the statistics of RH departures in terms of bias and standard deviations shown in Fig. 10, results for the FGcloud Oclear scenes show almost no dependency with the number of channels considered in the inversion. The mean values are negative (leading to a model drying) and the standard deviation values can reach up to 22 % between 300 hPa and 600 hPa.
(a) and (b): Mean and standard deviation distributions of the relative humidity differences between the WA experiments and the FG adding channels from 183.31 ± 7 GHz to 18.7 V GHz. Column (i) represents the results for experiment FGcloud Oclear and column (ii), those for FGclear Ocloud . As in Fig. 9, curves are drawn thicker with the number of channels increasing. Statistics are computed over the 2 month period from September to October 2017.
For the FGclear Ocloud scenes the results are significantly different [Fig. 10(ii)]. The mean values are positive, so the retrieved profiles will tend to moisten the atmosphere. However, the maximum value of the mean gradually decreases with height as more channels are added to the inversion. Hence, with only the 183.31 ± 3 GHz and 183.31 ± 7 GHz frequencies, the maximum values are located at approximately 400hPa and are respectively 9 % and 6 %. However, the experiment using all channels reaches a maximum of 12 % at 600 hPa. Moreover, when adding channels in the inversion, the mean relative humidity FG departure becomes progressively higher at several pressure levels. Thus, the addition of low-frequency channels in the inversion allows relative humidity to be more modified closer to the surface.
Figure 11 shows, with the ML estimator, the correlation coefficients between retrieved and observed brightness temperatures for the experiment set adding channels from 183.31 ± 7 GHz to 18.7 GHz (Fig. 11). The variations of the correlation coefficients shown on Fig. 11 are similar to the variations with the WA estimator (Fig. 9a). However, it can be noticed that the coefficients for the low-frequency channel of experiments using only frequencies ranging from 183.31 ± 7 GHz to 89 V GHz have smaller correlations with the ML estimator than with the WA estimator, for both categories of scenes. The other statistics (biases and standard deviations) are very close to the WA estimator (not shown). Thus, the Bayesian inversion using the ML estimator seems to be less well constrained with a limited number of channels used than using the WA estimator.
Correlation coefficient between the observations (GMI brightness temperatures) and the ML experiments adding channels from 183.31 ± 7 GHz to 18.7 V GHz. Column (i) represents the results for experiment FGcloud Oclear and column (ii) those for FGclear Ocloud . As in Fig. 9, curves are drawn thicker with the number of channels increasing. Statistics are computed over the 2 month period from September to October 2017.
Several conclusions can be drawn from the above results. A large number of channels within the inversion allows to better constrain the retrievals on the vertical unlike the experiments with a reduced set [as done by Duruisseau et al. (2019) only with 183 GHz channels]. Indeed, the various channel help to provide information at pressure levels where they are sensitive to either solid or liquid hydrometeors. Furthermore, the FG will moisten more homogeneously on the vertical with a full set of channels for the FGclear Ocloud scenes. For the other scenes, the FG will be dried out in the same way: The fact that no hydrometeors are present in the observed scenes leads to rather similar retrieved atmospheric profiles. Finally, the ML estimator has degraded performances with a limited set of channels.
4.4 Sensitivity to scattering properties of hydrometeorsOver the last decade, the radiative properties of frozen hydrometeors are a subject of research in the microwave assimilation community. Indeed, the scattering properties in the microwave depend strongly upon particle shape, density, and size, which are extremely variable for solid hydrometeors within clouds. For the time being, the Météo-France data assimilation systems used with the RTTOV-SCATT model can only handle one particle shape and one particle size distribution per hydrometeor type, for all weather situations. Knowing the actual solid hydrometeor diversity, this represents a huge simplification of the atmosphere behavior (e.g., Schmitt et al. 2016). Hence, several studies have been undertaken, and nowadays, a number of databases provide radiative properties for a wide range of snow particles with various shapes and densities (e.g., Eriksson et al. 2018; Kneifel et al. 2020). Additionally, the retrieval community is studying several methods to obtain information on hydrometeor diversity in the atmosphere from microwave observations and the assimilation community is just beginning to consider this diversity in numerical models (e.g., Haddad et al. 2015). In this section, we examine the impact of the selection of radiative properties for hydrometeors on the Bayesian inversion results. The Liu database (Liu 2008) has been chosen because of its availability in the RTTOV-SCATT model and wide use in the community.
The FG and observed brightness temperature distributions over the 2 month period are displayed in Fig. 12 for each particle of the Liu database. In this figure, one can note that the longest tails of distributions toward cold brightness temperatures are associated with particle shapes having the strongest scattering efficiency. For frequencies 18.7 V GHz and 23.8 V GHz (Figs. 12a, b), the distributions have a mode at approximately 210 K and 250 K, respectively, corresponding to clear-sky scenes and are skewed toward warmer temperatures of up to 280 K, corresponding to the microwave emission of liquid hydrometeors. The occurences of simulated brightness temperatures between 210 K and 255 K for the 18.7 V frequency and between 255 K and 280 K for the 23.8 V frequency are smaller than the observed occurences. These channels being sensitive to rain, thus seem to indicate a lack of liquid precipitation in the AROME model forecasts, which was already highlighted in Faure et al. (2020). Additionally, the distributions for each particle remain the same, which is consistent with the fact that low frequencies are not sensitive to snow particles.
(a–f): Distributions of GMI FG brightness temperatures for each particle at frequency 18.7 V GHz (a), 23.8 V GHz (b), 36.64 V GHz (c), 89 V GHz (d), 166 V GHz (e), and 183.31 ± 7 GHz (f). Statistics computed over the 2 month period from September to October 2017.
For the 36.64 GHz frequency (Fig. 12c), the distributions have a mode at approximately 220 K with a significant positive skewness as noticed for the lower frequencies. Conversely, for several particles (the most efficient scatterers) the distributions are skewed toward colder temperatures down to 170 K. This behavior is the signature of scattering by solid particles which seems to overcome the emission signature in these cases.
For higher frequencies (89, 166, and 183.31 GHz) the distributions shown in Figs. 12d, e, and f have a mode at warm temperatures (clear-sky scenes) and a negative skewness associated with the scattering processes by frozen particles. Additionally, the observation distributions displayed in Figs. 12d, e, and f appear to reasonably match the sector snowflake distributions as well as the bullet rosettes distributions. These results are consistent with the use of the sector snowflake particle as the reference in RTTOV-SCATT as suggested by Geer and Baordo (2014).
Furthermore, depending upon the choice of particle, the coldest simulated temperatures vary between 70 K and 150 K. We then decide to rank particles according to their scattering efficiency to aid the interpretation of the inversion results presented afterward.
The Bayesian inversion has been run over the 2 month period with the 11 particle shapes. It was found that the retrieved brightness temperatures are characterized by highly similar statistics (not shown). The retrieval algorithm always finds a combination of FG profiles that does match the observations in the brightness temperature space. Nevertheless, the underlying retrieved atmospheric profiles are rather different. Figure 13 displays the mean distributions of retrieved hydrometeor profiles for the FGclear Ocloud scenes. The less efficient scattering particles tend to retrieve atmospheric profiles with the largest rain, snow and cloud ice water contents, as shown respectively in Figs. 13a, 13b, and 13d.
Mean retrieved profiles of (a) rain, (b) snow, (c) cloud liquid water and (d) cloud ice water for the various particles in the Liu table (2008) for FGclear Ocloud scenes. Statistics are computed over the 2 month period from September to October 2017.
For example, the maximum of snow content means is 0.1 g kg−1 for one of the most efficient scattering particle shape, such as a block column, and 0.65 g kg−1 for one of the least efficient scattering particle shape, such as that of the dendrite snowflake. For the mean retrieved cloud liquid water profiles shown in Fig. 13c, the behavior is the opposite. Indeed, the retrievals with the more efficient scattering particles are those containing the highest cloud liquid water content: up to 0.25 g kg−1 for the block column versus 0.2 g kg−1 for dendrite snowflake particles. This suggests that the retrieval algorithm seeks to generate profiles with more rain, snow, and cloud ice water but less cloud water for the less efficient scattering particles to match a cloudy observation. This might come from a lack of constrain on these microphysics species or a compensation effect, which will need to be further investigated.
Figure 14 shows the statistics of relative humidity FG departures in terms of means and standard deviations. The mean profiles for FGcloud Oclear scenes exhibit negative values leading to the drying of the atmosphere with variations in magnitude between 200 hPa and 600 hPa. The most effective scattering particles lead to the largest atmospheric drying: for example, at 300 hPa, the mean values for the block column and the dendrite snowflake particles are respectively of −7 % and −3 %. For this category of scenes, one could have expected the particle shapes to have no effect on the retrieved profiles since clear-sky profiles do not or contain very little snow amounts. The fact that this is not the case indicates that the filtering of meteorological scenes, on the basis of the 36.64 GHz frequencies, does not fully discard all cloudy scenes, especially the nonprecipitating ice clouds. Moreover, the different meteorological weather scenes have been defined thanks to the brightness temperatures simulated with the sector snowflake particle. Another particle would have led to a different categorization, but the filter based on the sector snowflake particle has been kept in all the experiments for the rest of the study to compare identical samples.
Mean distributions of relative humidity differences between the FG and the WA experiments for particles from the Liu database (2008). Column (i) represents the results for the experiment FGcloud Oclear and column (ii), those for FGclear Ocloud. Statistics are computed over the 2 month period from September to October 2017.
For the FGclear Ocloud scenes, the mean profiles are distributed toward positive values leading to an atmospheric moistening. Additionally, two behaviors can be noticed:
In conclusion, the choice of frozen particles within the observation operator has an essential impact on retrieved relative humidity profiles. A most efficient scatterer will tend to reduce the profile moistening in the FGclear Ocloud scenes. Opposite behavior has been found over the mean distributions of the cloud liquid water profiles and over the mean distributions of relative humidity profiles in the lower layers. This will need further investigations for a deeper understanding of this effect. Finally, because a 1 K observation error was selected in the reference configuration, the results of statistics with the ML estimator (not shown) are similar to those with the WA estimator.
Several sensitivity studies have been performed on various specifications of the “1D-Bay+3D/4D-Var” method over a 2 month period in 2017 with the convective scale model AROME and GMI brightness temperatures from 18.7 GHz to 183.31 GHz. The following aspects have been examined: (i) the observation errors, (ii) the channel selection, and (iii) the scattering radiative properties of frozen hydrometeors in the observation operator. On top of assessing the impact of these specifications, two estimators have been compared: the mean of likelihood distribution (the so-called weighted average: WA) and the maximum value of this distribution (the so-called maximum likelihood: ML).
Observation errors ranging from 1 K to 20 K have been tested for the inversion with the WA estimator. Large errors tend to smooth and dry the retrieved profiles, as well as producing less realistic hydrometeor profiles. By constrast, small observation errors tend to retrieve profiles more consistent with observed brightness temperatures. Moreover, with small errors, the retrieved profiles are similar with both estimators, the ML one being invariant to this parameter when using a covariance error matrix with a single value along its diagonal. In a future study, dedicated diagnostics could be applied to estimate optimal parameters for the assimilation (e.g., Desroziers et al. 2005).
Eleven experiments have been conducted regarding the channel selection. Compared with that in previous studies (Guerbette et al. 2016; Duruisseau et al. 2018), it was shown that the Bayesian inversion does work with a larger set of channels. Adding low frequency channels to the inversion brings information on the low atmospheric layers of the atmosphere and leads to good-quality retrievals as shown in the observation space. Moreover, the results highlight the usefulness of two groups of frequencies, the lower ones (from 18.7 V to 36.64 H GHz) and the higher ones (from 89 V to 183.31 ± 7 GHz). Using low and high frequencies allows to constrain the retrievals below and above the freezing level, respectively.
The inversion experiments were performed for each frozen hydrometeor shape from Liu's database of scattering properties (Liu 2008). The conclusions drawn from this study are that the simulated brightness temperatures of the retrieved profiles remain consistent with observed brightness temperatures with any of the scattering properties selected. However, depending on the choice of particle shape for snowfall representation, the retrieved atmospheric profiles can vary significantly at high levels (moister profiles for less efficient scatterers). Indeed, a weakly scattering particle leads to a retrieval with larger hydrometeor and relative humidity contents. The opposite behavior was also found for the cloud liquid water content and the relative humidity content at low levels and need further investigations.
Finally, the ML and the WA estimators show differences over two cases: (i) ML does not depend on the observation error which is one strength of this estimator, and (ii) the fit to observations is degraded with ML when using a reduced number of channels.
This study brings several perspectives for testing the GMI retrieval assimilation as well as retrievals from other instruments of the GPM constellation.
First, the sensitivity study on the observation error highlighted the benefit of considering low values. Then, depending on the instrument frequencies selected in the Bayesian inversion, the resulting relative humidity profiles should be filtered out vertically. This will allow the assimilation of relative humidity in the 3D-Var or in the 4D-Var only at relevant pressure levels. To assess the quality of the retrievals, an independent validation would be to compare them with reflectivity profiles from the DPR onboard the GPM-Core satellite. Then, the Bayesian inversion could be enhanced by additional channels. Paticularly, the use of temperature sounding channels will be investigated in the future to better constrain the retrievals. The MicroMas instruments onboard Time-Resolved Observations of Precipitation structure and storm Intensity with a Constellation of Smallsats could be used to conduct such study (Blackwell et al. 2012).
The sensitivity study on the radiative properties highlighted that, as in the direct assimilation methods, it is essential to optimize the radiative properties within the observation operator. One interesting feature of the Bayesian inversion over the direct assimilation of brightness temperatures is that it is possible to use several different radiative properties within the inversion thus extending the database of profiles. This possibility will be tested in a future study.
Finally, the ML estimator shows both advantages and disadvantages over the weighted mean estimator.
Therefore, further comparisons will be perfomed and evaluated within assimilation experiments. One interesting feature of the Bayesian inversion is that it is possible to use different radiative properties of hydrometeors within the inversion by extending the database of simulated profiles. This option can be used both with the WA and ML estimators. This will be the subject of future research to further explore how the variability of hydrometeor shapes and distributions observed in nature can be represented within a data assimilation system.
This research is funded by Météo-France and Région Occitanie (PhD grant for Marylis Barreyat). The authors acknowledge the Centre National d'Études Spatiales (CNES) for the financial support of this scientific research activity part of the Infrarouge, Micro-Ondes et Transfert radiatif ensembliste pour la prévision des Extrêmes de Précipitations (IMOTEP) project. The two anonymous reviewers are acknowledged for their useful comments which helped improve the manuscript.