Geometry of rainfall ensemble means: from arithmetic

Abstract

approximated in real-time.In the new geometry, ensemble means are identified with GH barycenters, and the diffusion effect, as in the case of arithmetic means, is avoided.New ensemble means being placed side-by-side with deterministic forecasts provide useful information for forecasters in decision-making.

Introduction
Nowadays, ensemble forecasts play a vital role in numerical weather prediction.With advanced high-performance computing, the number of ensemble members is anticipated to continue increasing in the future.Therefore, it is important to extract useful information from a large number of individual forecasts that give equally possible realizations.The standard technique in ensemble forecast is to reduce all ensemble members to a small number of important fields, such as quantile and probabilistic maps, ensemble means and spreads, and ensemble clusters.Quantile and probabilistic maps turn the high-dimensional probability distribution from an ensemble forecast into scalar quantities derived from one-dimensional marginal distributions at grid points.In contrast, ensemble means and ensemble clusters retain the high dimensionality of forecast fields and their coherent structures, but only show typical representatives of all ensemble members.This is the multivariate nature of the latter approach that makes the resulting forecast fields more interesting and complicated in use.At the same time, it opens rooms for new interpretations and explorations.In this study, we demonstrate this interesting problem with the ensemble mean.
The ensemble mean is chosen due to its simplicity, which is, in fact, the arithmetic average of all forecast members.This operator tends to filter out random noise but, at the same time, diffuse informative processes in individual members, leading to a smooth mean field.The diffusion effect is noticeably clear when this operator is applied to rainfall.The resulting rain field tends to spread out and is noticeably different from each member.Figure 1 illustrates this phenomenon with 20 rainfall forecasts over Kyushu Japan from the mesoscale ensemble prediction system MEPS of Japan Meteorology Agency (JMA) (Ono et al., 2021) using the JMA's operational limited area model ASUCA (Ishida et al., 2022).
The arithmetic mean of 20 rainfall distributions differs greatly from the deterministic forecast.As a result, ensemble means of rainfall are rarely used in practice.
Figure 2a conceptually explains this fact using two one-dimensional rainfall distributions with the same shapes but a small displacement error.We expect that the mean should retain a similar shape with its location between the locations of the two individual distributions.However, the arithmetic average yields an undesirable result: a bimodal distribution with peaks much smaller than those of the two members.Furthermore, the mean distribution covers a wide area spreading from the left leg of the first member to the right leg of the second member.Although this explanation is employed in a one-dimensional space, it can be carried out in higher spaces without any significant difference.
An important reason for using the ensemble mean as a representative forecast is that if the probability distribution of forecasts is a multivariate normal distribution, the forecast Page 5 of 48 For Peer Review mean also gives the mode of this probability distribution.As a result, the forecast mean becomes the most probable forecast and can be taken as the best approximation of the true state.Under this assumption, the expected error between the ensemble mean and the observation is proved to be proportional to the reciprocal of the square root of the number of ensemble members (see Appendix A).Thus, by increasing the number of ensemble members, we expect to obtain a more accurate forecast through the ensemble mean.
However, this is not the case even with a relatively large number of ensemble members, as observed in Fig. 3a (1000 members in this case).Instead of two-dimensional rainfall distributions, the problem is simplified by only plotting the time series of 1-hour precipitation averaged over the Ichifusa catchment in Kyushu Japan.The forecasts are obtained from a 1000-member ensemble prediction system LETKF1000 (Duc et al., 2021) using the JMA's former operational limited area model NHM (Saito et al., 2006).Like the two-dimensional case, the ensemble mean does not show a similar pattern as the corresponding time series from the observations.However, intriguingly, if we plot the time series of accumulated rainfall instead of rain rates (Fig. 3b), the ensemble mean becomes nearly identical to the accumulated rainfall observations.This striking fact has been observed in several studies using a large number of ensemble members (Kobayashi et al. 2020(Kobayashi et al. , 2023) ) without any adequate explanation.
Geometrically, if we consider each distribution with n elements as a point in an Page 6 of 48 For Peer Review n-dimensional space , the ensemble mean is simply the center of mass or the ℝ  barycenter of all members, assuming that all members have the same mass.Figure 3b implies that if we want to retain the meaning of the ensemble mean as a barycenter, we need to work in the space of cumulative rainfall distributions.In other words, the appropriate use of the ensemble mean of rainfall in one-dimensional cases is with accumulated rainfall distributions rather than rain rate distributions.We can easily verify the validity of this hypothesis with the simple example in Fig. 2a. Figure 2b plots the cumulative distributions corresponding to the distributions in Fig. 2a.As expected, the cumulative distribution of the expected mean lies between the two cumulative distribution members.
However, what is notable here is that in cumulative forms, the expected mean is nearly identical to the arithmetic mean of the inverses of the cumulative distributions.Note that in Fig. 2b, since the cumulative distributions are one-to-one maps, their graphs also represent their inverses, in which time is considered as a function of cumulative rainfall.
Figure 2b suggests the following procedure to find the expected ensemble mean for any number of one-dimensional distributions: (1) Transforming all distributions to the space of inverses of cumulative distributions; (2) Calculating the arithmetic mean in the transformed space; (3) Transforming the resulting ensemble mean back to the space of distributions.
Of course, the important problem is how we can justify such a procedure with a robust Page 7 of 48 For Peer Review theoretical base.Note that Fig. 2b demonstrates the first two steps (1) and (2) of this procedure, while Fig. 2a demonstrates the last step (3).
The above procedure shows a potential way of finding expected ensemble means in high-dimensional cases.Thus, all we need is finding an appropriate space in which the ensemble mean retains its meaning as barycenters while being robust to the diffusion effect in the presence of displacement errors among members.However, an attempt to extend cumulative distributions from one-dimensional cases to high-dimensional cases does not work since it is unclear how to define a high-dimensional cumulative field from a high-dimensional distribution.A natural question is whether this space exists at all in general cases.It is noted that even in one-dimensional cases, the arithmetic average operator only makes sense if all one-dimensional distributions have the same total mass.
This means that even in the simplest cases, the existence of such a space is still questionable.
In this study, we show that the theory of optimal transport (OT) (Villani, 2009;Santambrogio, 2015;Peyré and Cuturi, 2019) proposes an elegant solution to this problem.Instead of working in a transformed space, we continue to stick with the space of rainfall distributions but endowed with a new geometry.Thus, the similarity between any distributions is no longer measured by the normal Euclidean distance but is replaced by a Page 8 of 48 For Peer Review new distance defined in the context of OT.In particular, this distance becomes the normal Euclidean distance between inverses of cumulative distributions in one-dimensional cases.
The theory of OT relevant to this study, i.e., unbalanced OT, is presented in the next section.The barycenters, resulting from the new geometry of rainfall distributions defined by unbalanced OT, are described and analyzed in Section 3. Finally, Section 4 summarizes the main points of this study and discusses some potential applications.

Unbalanced optimal transport
Although OT has been successfully applied in many fields of data science, there is only a limited number of applications of OT in geosciences (Farchi et al., 2016;Métivier et al., 2016;Yang et al., 2018;Sambridge et al., 2022).However, it is worth noting that in recent years, we have seen an increase in studies using OT in geophysical data assimilation (Reich and Cotter, 2015;Feyeux et al., 2018;Li et al., 2018;Tamang et al., 2021;Vanderbecken et al., 2023).In order to make OT accessible to the meteorology community, the theory of OT will be adapted to rainfall distributions and simplified in this section.A more rigorous treatment for probabilistic distributions can be found in Villani (2009) and Santambrogio (2015).Our mathematical treatment in this section mainly follows Peyré and Cuturi (2019).

Let two vectors
denote two rainfall distributions with the same total rain mass , ∈ ℝ  + Page 9 of 48 For Peer Review over the same domain D. We consider rainfall in the same domain, however, in the theory of OT, two distributions need not be on the same domain.We call a matrix a  ∈ ℝ  ×  + transport plan that moves to in the sense that the element denotes the rain mass     from the bin i to the bin j.A bin corresponds to a grid box in the domain.For mass conservation, we impose two constraints on the elements of where denotes a vector with all elements equal one.From the constraints (1), it is easy  to verify that the total rain mass is conserved , we have a matrix whose element denotes the   ∈ ℝ  ×  +   transportation cost from the bin i to the bin j.
The original theory of OT seeks the OT plan that minimizes the following objective  * function , (3) subject to the constraints (1) where the symbol denotes the inner product of two 〈〉 matrices.With the linear constraints (1), the linear programming problem (3) is convex and therefore has a global minimum.This formulation is known as the Kantorovich problem in OT (Kantorovich, 1942), an extension of the Monge problem in which mass splitting is not Page 10 of 48 For Peer Review allowed (the entire mass from a bin is moved to another bin).What is the connection between the optimal cost and the geometry of rainfall distributions?L  (,) The connection follows from an important theorem: if where is the Minkowski distance of order p between two grid points , then induces a distance between and , which is called the p-Wasserstein distance We replace with in (4) to emphasize that the new distance is defined by using the Minkowski distance as the transportation cost.Also, recall that the Minkowski distance of order 2 is the familiar Euclidean distance.In order to be a distance, the distance function has to be positive, symmetric, and has to obey the triangular inequality.The p-Wasserstein distance usually does not have an analytic form, and can only be estimated numerically.
However, for one-dimensional distributions, has a closed form given by W  (,) , ( 5) where denotes the inverse of the cumulative distribution constructed from .Thus, F -1 ()  when , i.e., is the Euclidean distance between and , the  = 2 2-Wasserstein distance is identical to the Euclidean distance between and .
This fact supports our averaging operation in Fig. 2b, where we take the arithmetic mean of two inverses of cumulative distributions.This step is equivalent to taking the barycenter  of two distributions and with respect to the 2-Wasserstein distance Of course, in practice, we avoid minimizing (6) by simply taking the average of F -1 ()

and
, and we transform it back to the space of distributions.
Recall that the OT problem (3) strictly requires the same total mass between and .

𝐚 𝐛
However, rainfall distributions from different ensemble members rarely satisfy this condition.Therefore, rainfall distributions cannot be considered in the same Wasserstein space.This limitation prevents the application of OT for rainfall distributions.Clearly, the averaging operation does not make sense in Fig. 2b if the two cumulative distributions have two different heights.
There are many attempts to relax the requirement of the same total mass in OT.The most successful approach is known under the name unbalanced optimal transport (UOT) (Frogner et al., 2015;Chizat et al., 2018a;Liero et al., 2018), which relaxes the objective function (3) with the new form , (7) where is the marginal relaxation parameter that penalizes the discrepancy between  mass transportation and original mass distributions.This discrepancy is measured with the Kullback-Leibler (KL) divergence . (8) The existence of the term in ( 8) is due to unequal total mass between and .  -    Thus, (7) replaces the strong constraints (1) with weak constraints through the marginal relaxation terms, and controls these constraints by .It can be verified that (7) reduces to (3) in the limit under the total mass constraint .
Like the 2-Wasserstein distance, when is the square of the Euclidean distance, the optimal cost (7) defines a distance between and , called the Gaussian-Hellinger (GH) distance .( 9) GH(,) = L  2 (,) 1/2 However, unlike , can only be estimated numerically even for W  (,) GH(,) one-dimensional distributions.Using this new distance, we can estimate the barycenter of an ensemble of rainfall distributions from the following minimization problem In other words, we seek the distribution that minimizes the averaged GH distances from  to all .Figure 4 illustrates this barycenter problem by showing GH-barycenters of two    different rainfall distributions in terms of both volumes and spreads.The computation is employed by using the algorithm described in the next section.

Regularized Gaussian-Hellinger barycenters
The most common method to solve the linear programing problem (3) is the simplex algorithm.In general, the computational complexity of the simplex method is ( 3 ) which limits the applications of OT in practice.Recall that denotes the size of the vectors  , which is the number of grid points in the domain D. With the introduction of the , marginal relaxation, the UOT minimization problem ( 7) is no longer a linear programing problem.The closed solution exists when distributions have Gaussian forms (Janati et al., 2020).Blondel et al. (2018) proposed to solve this using the L-BFGS-B algorithm with the squared norm in place of the KL divergence.Sato et al. ( 2020) provided an effective solution when distributions have a tree structure.Chapel et al. (2021) showed that (7) can be turned into a non-negative linear regression problem, and solved with non-negative matrix factorization.However, what makes UOT applicable in practice is the introduction of entropic regularization into UOT, leading to scalable and fast algorithms.
The idea of adding an entropic regularization term into (3) has been proposed by Cuturi (2012) to make the problem strictly convex and, therefore, simpler to solve.In particular, regularization enables the use of the fast Sinkhorn-Knopp algorithm (Sinkhorn and Knopp, 1967;Benamou et al., 2015) with the complexity to approximate the optimal plan.( 2 ) This idea has been introduced into UOT by Chizat et al. (2018b)  where is the entropic regularization parameter, and represents the entropy of  H() . ( 12) The regularized GH distance associated with the regularized UOT problem (11) is expressed as , which transforms our barycenter problem (10) to GH  (,) . ( 13) min By letting go to zero in minimizing (13), we obtain the GH barycenter of the original  barycenter problem (10).
Figure 4 shows the impact of the entropic regularization and the marginal relaxation on approximations of GH barycenters using Algorithm 1. Due to the fact that OT plans become less sparse under the entropic regularization, with large values cause the mass spread  in approximated GH barycenters, as illustrated in Fig. 4a.This implies that to avoid the diffusion effect, should be set to small values.However, if are too small, the   Sinkhorn-Knopp algorithm will suffer from numerical underflow and overflow, and fail to terminate.For the marginal relaxation , it is interesting to see that when a large mass  discrepancy is enabled, i.e., are small, the UOT barycenter problem leads back to a  distribution similar to the arithmetic mean of two ensemble members.Therefore, should  be set to large values to avoid arithmetic means.However, if are too large, the algorithm  will converge slower due to .The settings of and will be applied in ~1  = 10 -4  = 10 all the computations using Algorithm 1.
Page 16 of 48 For Peer Review When applying the matrix scaling algorithm to two-dimensional rainfall distributions, the most challenging issue is the exponential increase in computational costs.For the one-dimensional case in Fig. 4, the size of the problem is .Let us consider  = 100 two-dimensional distributions with the same size in each direction.The problem size becomes , leading to an increase of times in the  = 100 2 = 10 4 10 8 /10 4 = 10 4 computational cost.Notice that the computation cost in Algorithm 1 is mainly dominated by the matrix-vector products, and , which takes operations.In order to    T   ( 2 ) mitigate the huge computational cost in two-dimensional cases, we model as a tensor  product , so that the matrix-vector products can be done in each x, y direction  =   ⊗   separately.As a result, the computation cost reduces to resulting in time ( 3/2 ) 10 2 increase in the computational cost as compared to one-dimensional cases, which is affordable in practice.therefore, overconfidence from the deterministic forecasts may be avoided.This information is important for forecasters in decision-making.
Objective verification using the Fractions Skill Score (FSS) (Roberts and Lean, 2008) is performed to quantify the performances of the two kinds of barycenters in addition to the deterministic forecasts.The verification results are shown in Fig. 6.Clearly, for the arithmetic means, due to the diffusion effect, their FSSs drop rapidly to zero when the rainfall thresholds increase.This means the GH barycenters outperform the arithmetic means for intense rain.However, the arithmetic means are not entirely worse than the GH barycenters.At the rainfall thresholds smaller than 10 mm (3h) -1 , the arithmetic means yield forecasts slightly better than the GH barycenters.The reason can be traced back to Fig. 1, where a large rainfall area forecasted by the arithmetic means (Fig. 1c) because the diffusion effect is unexpectedly in accordance with the observed rainfall area (Fig. 1a).In contrast, individual forecasts similar to the deterministic forecasts tend to predict a rainfall area much smaller than that of the observations.As a result, the GH barycenters become slightly worse than the arithmetic means in predicting the rainfall area.
The computing program for the barycenters in Fig. 5 is parallelized along the direction of ensemble members, i.e., each processor only works with a subset of ensemble members.
With 20 Intel Xeon processors and the domain consisting of 311x242 grid points, each GH Page 18 of 48 For Peer Review barycenter takes three minutes to calculate.The running time can be accelerated considerably if parallelization in the x and y directions is employed.Since all GH barycenters are independent with respect to different lead times, they can be produced in parallel.This enables GH barycenters to be deployed in real-time.

Discussions and conclusions
It is well-known in rainfall ensemble forecasts that ensemble means suffer substantially from the diffusion effect resulting from the averaging operator.Therefore, ensemble means are usually not comparable with any ensemble members, and as a result, are rarely used in practice.The use of the arithmetic average to compute ensemble means is equivalent to the definition of ensemble means as centers of mass of all ensemble members where each member is considered as a point in a high-dimensional Euclidean space.This study uses the limitation of ensemble means as evidence to support the viewpoint that the geometry of rainfall distributions is not the familiar Euclidean space, but a different metric space associated with a certain distance.The rigorously mathematical theory underlying this space has already been developed in the theory of OT with various applications in other disciplines, of which objects are the same kind of distributions as rainfall distributions.
In the theory of OT, all distributions are required to have the same total mass.This requirement is, of course, rarely satisfied in rainfall ensemble forecasts.We, therefore, Page 19 of 48 For Peer Review develop the geometry of rainfall distributions from an extension of OT called UOT.This geometry is associated with the GH distance defined in UOT.This distance is the optimal cost to push a source distribution to a destination distribution with penalties on the mass discrepancy between mass transportation and original mass distributions.The applications of the new geometry of rainfall distributions in practice are enabled by the fast and scalable Sinkhorn-Knopp algorithms, in which GH distances or GH barycenters can be approximated in real-time.By replacing arithmetic means with GH barycenters, the diffusion effect is avoided.Furthermore, new ensemble means, with respect to the GH distance, being placed side-by-side with deterministic forecasts provide useful information for forecasters in decision-making.
A new view on the geometry of rainfall distributions should provide solutions for a broader range of problems, not limited to ensemble means.We now try to tackle the reason underlying the resemblance of the ensemble means and the observations for one-dimensional cumulative distributions in Fig. 3b that is left in the introduction.In the metric space defined by the GH distance, GH barycenters are expected to approach observations with increasing the number of ensemble members.Recall that the 2-Wasserstein distance is equivalent to the Euclidean distance of inverses of cumulative distributions.This suggests that to grasp the convergence with respect to the GH distance, a distance similar to the 2-Wasserstein distance, GH barycenters should be plotted in Page 20 of 48 For Peer Review cumulative forms.As expected, Fig. 7 shows that when the number of ensemble members increases from 20 (MEPS) to 100 (LETF100) and 1000 (LETKF1000) (see Duc et al., 2001, for detailed descriptions of the three ensemble prediction systems), the GH barycenters gradually approach the observations.However, what is more surprising is that the arithmetic means also converge to the GH barycenters.This explains why the arithmetic means converge to the observations in Fig. 3b.
Since arithmetic means are, in fact, Euclidean barycenters, this raises a question on how we explain the convergence of two barycenters with increasing the number of ensemble members.In general, it is easy to show a counter-example for this property, e.g., many pairs of ensemble members with a fixed Wasserstein mean in Fig. 2b.Therefore, we hypothesize that this is a special property of rainfall ensemble forecasts in numerical weather prediction.In order to provide evidence for this hypothesis, Fig. 8 plots two-dimensional GH barycenters and arithmetic means from the same ensemble forecast systems in Fig. 7. Clearly, the arithmetic means again become nearly identical to the GH barycenters when the number of ensemble members reaches 1000.Thus, two-dimensional rainfall distributions also show evidence for this property as in one-dimensional distributions.Of course, this hypothesis should be verified for more cases.

Figure 5
Figure5shows the GH barycenters for consecutive 3-hour precipitation from 00 to 09

Figure 1 :
Figure 1: (a) Accumulated precipitations observed by radars and rain gauges between

Figure 2 :
Figure 2: (a) A one-dimensional simple model explaining undesirable behavior of ensemble

Figure 3 :
Figure 3: (a) Time series of the 1-hour precipitation averaged over the Ichifusa catchment

Figure 4 :
Figure 4: Dependence of regularized GH barycenters on (a) the entropic regularization

Figure 8 :
Figure 8: As Fig. 5 but with the accumulated precipitation between 00-09 JST on July 4 th

Figure 1 :
Figure 1: (a) Accumulated precipitations observed by radars and rain gauges between

Figure 3 :
Figure 3: (a) Time series of the 1-hour precipitation averaged over the Ichifusa catchment

Figure 4 :
Figure 4: Dependence of regularized GH barycenters on (a) the entropic regularization

Figure 7 :Figure 8 :
Figure 7: As Fig. 3b but with the ensemble forecasts of (a) MEPS, (b) LETKF100, and (c) What are the other potential applications of the UOT-based geometry of rainfall distributions?In this study, we use ensemble means to illustrate one of the potential applications of the new geometry.However, it is important to verify the performance of GH barycenters in comparison with deterministic forecasts or traditional ensemble means.This verification is usually quantified by objective verification scores as demonstrated in Fig.6with FSS.Due to its nature as a similarity measure, the GH distance should be a natural verification score in rainfall verification.Also, ensemble means do not make sense if forecasts show a bi-modal probability distribution.In such cases, clustering needs to be deployed first, and clusters are represented by appropriate representatives.Then, the clustering can use the GH distance as a similarity measure, while clusters can be Janati, H.,B. Muzellec, G. Peyré, and M. Cuturi, 2020:Entropic optimal transport between unbalanced gaussian measures has a closed form.Advances in Neural Information Processing Systems, 33.Kantorovich, L.V, 1942: On translation of mass (in Russian).Dokl.AN SSSR, 37,[199][200][201]