Article ID: 2024-003
It is well-known in rainfall ensemble forecasts that ensemble means suffer substantially from the diffusion effect resulting from the averaging operator. Therefore, ensemble means are rarely used in practice. The use of the arithmetic average to compute ensemble means is equivalent to the definition of ensemble means as centers of mass or barycenters of all ensemble members where each ensemble member is considered as a point in a high-dimensional Euclidean space. This study uses the limitation of ensemble means as evidence to support the viewpoint that the geometry of rainfall distributions is not the familiar Euclidean space, but a different space. The rigorously mathematical theory underlying this space has already been developed in the theory of optimal transport (OT) with various applications in data science.
In the theory of OT, all distributions are required to have the same total mass. This requirement is rarely satisfied in rainfall ensemble forecasts. We, therefore, develop the geometry of rainfall distributions from an extension of OT called unbalanced OT. This geometry is associated with the Gaussian-Hellinger (GH) distance, defined as the optimal cost to push a source distribution to a destination distribution with penalties on the mass discrepancy between mass transportation and original mass distributions. Applications of the new geometry of rainfall distributions in practice are enabled by the fast and scalable Sinkhorn-Knopp algorithms, in which GH distances or GH barycenters can be approximated in real-time. In the new geometry, ensemble means are identified with GH barycenters, and the diffusion effect, as in the case of arithmetic means, is avoided. New ensemble means being placed side-by-side with deterministic forecasts provide useful information for forecasters in decision-making.