Global Simulations of the Atmosphere at 1.45 km Grid-Spacing with the Integrated Forecasting System

Global simulations with 1.45 km grid spacing are presented that were performed using the Integrated Forecasting System (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF). Simulations are uncoupled (without ocean, sea ice, or wave model), using 62 or 137 vertical levels and the full complexity of weather forecast simulations is presented, including recent date initial conditions, real-world topography, and state-of-the-art physical parametrizations, as well as diabatic forcing including shallow convection, turbulent diffusion, radiation and five categories for the water substance (vapor, liquid, ice, rain, and snow). Simulations are evaluated with regard to computational efficiency and model fidelity. Scaling results are presented, which were performed on the fastest supercomputer in Europe, Piz Daint (Top 500, November 2018). Important choices for the model configuration at this unprecedented resolution for the IFS are discussed such as the use of hydrostatic and non-hydrostatic equations or the time resolution of physical phenomena which is defined by the length of the time step. Our simulations indicate that the IFS model—based on spectral transforms with a semi-implicit, semi-Lagrangian time stepping scheme in contrast to more local discretization techniques—can provide a meaningful baseline reference for O(1) km global simulations.


Introduction
The complexity and quality of weather and climate models have improved at a remarkable speed during the last decades  and the steady increase in computing power has allowed for a steady increase in model resolution and complexity of forecast models. However, the recent slow down of the increase in the performance of individual processors is now inducing challenges for the domain of weather and climate modeling. It is getting more complicated to efficiently use modern and future supercomputers that require applications to use massive parallelism of up to O(10 6 ) processing units and heterogeneous hardware including central processing units (CPUs), graphics processing units (GPUs), tensor processing units (TPUs), field-programmable gate arrays (FPGAs), and more. This is difficult for weather and climate models that are comprised of O(1 million) lines of model code, require diverse mathematical algorithms within a single modeling framework, and are often written in different styles of coding for the different model components.
As the model resolution of global atmospheric simulations is always insufficient to explicitly resolve all features of the Earth System, several sub grid features need to be parametrized. This involves a description of the statistical contributions of subgrid-scale processes on the mean flow, expressed in terms of the mean flow parameters. This closure thus relies on the averaged equations and explicit expressions for the higher-order terms arising from the perturbations of the mean flow. In addition, parametrizations describe diabatic effects, such as radiation and water phase changes, as well as processes for which equations that describe the underlying physical behavior are unknown, such as soil processes. As resolution is increased, features of the Earth System such as deep convection can be explicitly represented on the computational grid of the model simulation.
Deep convection plays a significant role for the vertical transport of energy in the tropics which is driving the global circulation of the atmosphere through well known circulation patterns, such as the Hadley cell and the meandering Inter-Tropical Convergence Zone (ITCZ). Today, only few global weather and climate models run routinely at a grid spacing of less than 10 km. Several of these models have contributed to the DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains (DYAMOND) model inter-comparison for which nine models performed 40 day global simulations at a grid spacing finer than 5 km . However, as global weather and climate models are approaching grid spacings of a few kilometers, they are entering the so-called grey-zone of convection, where certain limiting assumptions underlying deep convection parametrization-that deep convection can be represented as a bulk parametrization scheme based on an ensemble of independent convective updrafts within a grid cell-cannot be justified anymore -if grid cells partially or fully represent a single updraft. If the deep convection parametrization scheme is switched off, convection is explicitly simulated by the governing equations. However, convective cells will be significantly bigger when compared with convective cells in the real-world if the resolution of the model is insufficient. As a result, explicitly simulated convective cells assume the size of one or several of the chosen grid sizes, and unrealistically sized convective cells may cause a degradation of forecast skill in comparison with coarser simulations, where convection parametrization is used. Global simulations in which deep convection parametrization is switched off are often called "cloud resolving". We will, however, refer to our simulations as "storm resolving", following the convention of the DYAMOND project ) since a grid spacing of 1.45 km will still not be sufficient to resolve individual clouds.
It has been suggested to move from O(10 km) grid spacing to a grid spacing of O(1 km) that would potentially allow sufficient resolution of deep convection for global weather and climate models and to skip the resolution range in between. There is substantial experience in the limited area community in Europe (Termonia et al. 2018), where forecast models operate routinely at intermediate grid spacings of 5 or 2.5 km. As discussed by Neumann et al. (2018), a grid spacing of O(1 km) for global atmosphere models would potentially show a number of improvements, including the representation of topographic gravity waves and surface drag that are induced by explicitly represented small-scale topography, and the ability to assimilate satellite data at its native resolution. Ocean models at O(1 km) grid spacing can resolve a larger fraction of mesoscale eddies when compared to today's models and can simulate ocean tides explicitly. Meso-scale eddies are essential to represent ocean variability accurately.
Until today there are only a small number of simulations of the (near-)global atmosphere with a grid spacing close to or beyond 1 km. These include a seminal 12 h long simulation at 870 m grid spacing with the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) (Miyamoto et al. 2013). There was also a simulation of the atmosphere between the latitudes −80° and 80° for 10 days with the Consortium for Small-Scale Modeling (COSMO) model at a grid spacing of 930 m that was performed for the idealized test case of a baroclinic instability (Fuhrer et al. 2018). Model simulations at slightly lower resolution have been presented in various papers (Miura et al. 2007;Satoh et al. 2008;Fudeyasu et al. 2008;Skamarock et al. 2014;Michalakes et al. 2015;Müller et al. 2018). An overview of the history of global storm-resolving models can be found in the paper by Satoh et al. (2019).
Many simulations have also been performed at high resolution but for limited domains. Bretherton and Khairoutdinov (2015) simulated a 20,480 × 10,240 km equatorial channel for 30 days at 4 km grid spacing, and Leutwyler et al. (2017) show 3-month-long simulation with 2.2 km grid spacing on a European-scale computational domain using the COSMO model. Yang et al. (2016) performed simulations with a moist baroclinic instability test in a β-plane three-dimensional channel resembling the latitude range between 18° and 72° north with a horizontal grid spacing of 488 m. Heinze et al. (2017) present large eddy simulations with ICON over almost the entire area of Germany, with 156 m grid spacing for weather-type timescales. It is also worth mentioning the radiative convective equilibrium model inter-comparison project (RCEMIP) that is comparing global model simulations for an idealization of the climate system to have sufficient understanding of clouds, convection and climate sensitivity and to quantify differences between models (Wing et al. 2018).
A number of global weather and climate models are using the hydrostatic approximation within the so called set of primitive (shallow atmosphere) equations for operational forecasts. This approach assumes vertical accelerations to be small compared with the balancing forces of gravity and the vertical pressure gradient. This is typically valid when the ratio of vertical to horizontal length scales of motion is small. As a result, vertical velocity becomes a diagnostic variable that can be derived from the continuity equation, and for energetic consistency additional acceleration terms in the horizontal momentum equations on the sphere are dropped. More recent work allows to relax this traditional approximation despite continuing to use the hydrostatic assumption (Tort and Dubos 2014).
When the aspect ratio of vertical to horizontal motions becomes approximately one (Jeevanjee 2017), the hydrostatic approximation will become invalid. However, the precise grid spacing when this is happening seems to depend on the particular model, the model configuration, and the significance for the features of interest. Daley (1988) suggests that for global models with a spectral truncation numbers greater than 400 (> 25 km grid spacing) the non-hydrostatic set of equations should be used. However, Ross and Orlanksi (1978) found only little difference between hydrostatic and non-hydrostatic two-dimensional simulations of an idealized cold front at a resolution of 20 km. For a similar case, Orlanski (1981) found significant differences at a resolution of 8 km. Dudhia (1993) simulated a cold front with a hydrostatic and non-hydrostatic model configuration, and both versions produced similar results for grid spacings of 6.67 km. Kato (1997) found that a hydrostatic model with idealized moist convection overestimated precipitation at 5 km grid spacing. Weisman et al. (1997) performed simulations of a squall line with different grid spacings ranging from 20 km to 1 km and found that the hydrostatic model overestimated the maximum vertical velocity at grid spacings of 4 km and lower. Jeevanjee (2017) ran idealized radiative convective equilibrium simulations over the sea for grid spacings ranging from 16 to 0.0625 km and found that the hydrostatic model started to overestimate the vertical velocities for grid spacings smaller than 2 km. Often quoted are also situations with vertical wind shear, where vertically propagating gravity waves are trapped in the lee of the mountain and energy propagates horizontally rather than vertically (Keller 1994). Hydrostatic models do not "see" the shear, and gravity waves propagate vertically upwards. However, as presented by Wedi and Smolarkiewicz (2009), if the mountain is not resolved with a sufficient number of grid points relative to the mountain width (and for the given flow regime), also non-hydrostatic models will exhibit the characteristic hydrostatic (non-trapped) behavior.
There is no consensus in the literature which spatial discretization scheme would be most appropriate for global storm-resolving simulations. Indeed, all of the common approaches for the development of dynamical cores, including finite difference, finite volume, finite element or spectral methods, seem to be capable of running global model simulations at a grid spacing of only a few kilometers on state-of-the-art supercomputers (Michalakes et al. 2015;Stevens et al. 2019). There is also no consensus on whether explicit, semi-implicit or fully implicit timestepping schemes are most promising for use in global storm-resolving simulations. Implicit schemes allow to use a larger timestep in comparison with explicit schemes, since they are not bound by Courant-Friedrichs-Lewy-type constraints for fast wave motions. However, Fuhrer et al. (2018) argue that even fully implicit, global, convection-resolving climate simulations at 1 -2 km grid spacing cannot be considered a viable option when using a time step larger than 40 -60 s because sound wave propagation and important diabatic processes are not resolved in time, potentially leading to a change in the history of the flow evolution. Instead, they use a split-explicit time stepping scheme with a timestep of 6 s when running with a grid spacing of less than a kilometer. In contrast, Yang et al. (2016) propose to work with implicit schemes and show results using a large timestep of 240 s when running at a grid spacing of less than a kilometer to achieve the best time-tosolution for simulations. As pointed out by Wedi et al. (2015) and evident in the paper by Mengaldo et al. (2018), different timestepping approaches may incur low-order time truncation errors compared with the nominal spatial truncation error of a given model, especially if time and space are handled independently. Thus a careful analysis of time truncation error at 1 km global grid spacing is pending.
This paper presents global simulations of the atmosphere with the Integrated Forecasting System (IFS) with up to 1.45 km grid spacing. The performance of the IFS is discussed, and the scalability tests on the Piz Daint supercomputer and the two supercomputers of the European Centre for Medium-Range Weather Forecasts (ECMWF) are presented. These scaling results provide a good benchmark for the improvements in efficiency that would be required to allow for global, operational weather forecasts or climate projections at storm-resolving resolution. A first scientific evaluation of the IFS model fidelity for simulations at storm-resolving 1.45 km grid spacing is presented. This includes a discussion of the effective resolution of atmospheric dynamics from energy spectra (Abdalla et al. 2013) and a limited assessment how choices for the model configuration, for example regarding the use of non-hydrostatic equations, the parametrization of convection, or the timestep length influence model simulations.
Section 2 provides details of the model configuration that was used. Section 3 will discuss the performance and scaling behavior of the model. Section 4 will present the scientific evaluation of model runs. Sections 5 and 6 will provide the discussion and conclusion.

A description of the IFS
We perform model simulations using the uncoupled IFS atmosphere model cycle 45r2 (no ocean, sea ice, or wave model, because these currently limit the scalability of the coupled system at 1.45 km grid spacing). The IFS is a spectral transform model where prognostic variables have a dual representation in grid-point space, and global spectral space represented via spherical harmonic basis functions. The latter facilitates easy computations of horizontal gradients and the Laplacian operator relevant for horizontal wave propagation. The special property of the horizontal Laplacian operator in spectral space on the sphere conveniently transforms the three-dimensional Helmholtz problem, arising from the semi-implicit discretization, into an array (for each zonal wavenumber) of twodimensional matrix operator inversions. Importantly, products of terms, (semi-Lagrangian) advection and all (columnar) physical parametrizations are computed in grid-point space. Water substances have only a representation in grid-point space. A cubic octahedral (reduced) Gaussian grid is used for this purpose (Wedi 2014;Malardel et al. 2016).
To transform between grid-point and spectral spaces requires the subsequent use of a Legendre transformation and a fast Fourier transformation (called "transforms" in the rest of the paper). To improve performance for the calculation of the Legendre transformation, which shows a computational complexity proportional to N 3 with the truncation wavenumber N, a so-called "fast Legendre transformation" was introduced that is trading performance against accuracy and achieving a scaling behavior of N 2 log 3 (N ) . To avoid the transforms for highresolution simulations in the future, ECMWF is also developing an alternative non-hydrostatic dynamical core based on a finite-volume discretization with the same collocation of prognostic variables as in the current IFS (Integrated Forecasting System-Finite-Volume Module (IFS-FVM); Küehnlein et al. 2019). However, in this paper we use IFS to refer to the spectral transform model.
The IFS is based on a semi-implicit semi-Lagrangian timestepping scheme with no decentering that allows for the use of long timesteps. We are using the same timestep for both dynamics and all physics at a grid spacing of 1.45 km. There are two exceptions with turbulent vertical diffusion using two sub-steps and an hourly call frequency for radiative transfer calculations. Model simulations are initialized from the 9 km operational analysis of ECMWF on October 13, 2016, 0 UTC, suitably interpolated using the integrated interpolation and post-processing software of Arpege/IFS ("https://www.umr-cnrm.fr/gmapdoc") to the target grid that is used for storm-resolving simulations. Next to the transforms, physical parametrization schemes ("physics") and the semi-Lagrangian advection scheme are the largest contributors to the computational cost of simulations that are both calculated in grid-point space. Only a comparably small fraction of the cost is generated by calculations in spectral space, mostly related to the semi-implicit timestepping scheme.
Most of the model simulations were performed using the single precision version of the IFS using 32 bits to represent real numbers. This version is using single precision for almost the entire model integration (Düeben and Palmer 2014; Vana et al. 2017). The quality of forecast simulations is equivalent between double, and single-precision simulations. However, the use of single precision is causing a small error in mass conservation, and a global mass fixer is used in these simulations. The global mass fixer is cheap and easy to apply within a spectral model. The use of single precision is reducing runtime by approximately 40 % (dependent on the Message Passing Interface (MPI)/ Open Multi-Processing (OpenMP) configuration of the runs), and memory requirements are reduced significantly making it possible to run simulations also on a much smaller number of nodes for testing. We perform both hydrostatic and non-hydrostatic simulations. Non-hydrostatic simulations are formulating the non-hydrostatic system in a mass-based vertical coordinate and adding prognostic variables for the vertical velocity and a deviation from the hydrostatic pressure. The resulting semi-implicit system is more complicated when compared with hydrostatic simulations but similarly solved in spectral space (see e.g., Voitus et al. (2019) and references therein).
We will compare model simulations that are using a different number of iterations for the optional predictor-corrector (PC) timestepping scheme required for stability in the non-hydrostatic model. The advection (horizontal and vertical) and the entire spectral semi-implicit solve, including the spectral transforms of several prognostic variables, are required in each iteration, which is causing a significant increase in the computational cost of the non-hydrostatic model (for details, see Wedi et al. , 2013. The second difference is the use of a finite-element discretization scheme in the vertical direction for standard hydrostatic simulations and a finite-difference scheme that is currently used when running in non-hydrostatic mode. As detailed by Bubnova et al. (1995), the vertical discretization has to be bespoke to ensure that the discrete and continuous system of equations are consistent. Improved consistency together with better treatment of vertical boundary conditions for the nonhydrostatic configuration may be achieved through changes to the equations and corresponding changes to the solution algorithm, as detailed by Voitus et al. (2019). A vertical finite-element scheme for the nonhydrostatic equations is also under active development (see also https://www.ecmwf.int/en/newsletter/161/ news/ecmwf-tests-new-numerical-scheme-verticalgrid) but neither of these developments are available for experimentation at a grid spacing of 1.45 km.
Most of the model simulations that are presented in this paper are based on the cubic octahedral grid with an average 1.45 km (TCo7999) grid spacing (1.25 km near the equator). Operational weather forecasts at ECMWF use a cubic octahedral Gaussian grid with 9 km grid spacing (TCo1279) for deterministic forecasts and 18 km grid spacing (TCo639) for ensemble predictions with 50 ensemble members.
We use the standard procedures for ECMWF to generate high-resolution topography fields (https:// www.ecmwf.int/en/forecasts/documentation-andsupport/changes-ecmwf-model/ifs-documentation). The 30″ orography data derived from different sources is spectrally fitted to T15999, slightly filtered in spectral space, and truncated to the 7999 truncation. The resulting field is used as input to the IFS simulations.
There is no representation of subgrid-scale topography within the high-resolution simulations at 1.45 km grid spacing.
We use the standard set of physical parametrization schemes of the IFS for all forecasts presented in this paper, cloud microphysics with five categories for water substance (vapor, liquid, ice, rain, and snow), radiation, shallow convection, turbulent vertical diffusion, and the ECMWF land-surface model (HTESSEL; https://www.ecmwf.int/en/forecasts/documentationand-support/changes-ecmwf-model/ifs-documentation). The parametrization of deep convection is switched off for simulations at 1.45 km grid spacing. However, the parametrization of shallow convection remains active in all simulations. Figure 1 presents the topography as it is used in deterministic operational forecasts at ECMWF and in the 1.45 km grid spacing. The detailed representation of topography is one compelling reason to increase resolution of atmospheric models in complex terrain. Indeed, the figures present a remarkable level of detail with a significant improvement of the representation of valleys for mountain ranges, such as the Alps or the Himalayas, if grid spacing is reduced to 1.45 km.

Scalability
ECMWF is spending significant resources to optimize simulations using the IFS for present and future high performance computers as part of its scalability program (see Bauer et al. 2020, or  We have performed model simulations using IFS on the two supercomputers of ECMWF and the Piz Daint supercomputer at the Swiss National Supercomputing Centre (CSCS). ECMWF has two identical CRAY compute clusters. Each of them has 3610 Cray XC40 nodes and a peak performance of 4.25 petaflop. Every node has two Intel E5-2695v4 Broadwell CPUs. Each CPU has 18 compute cores. Piz Daint is the fastest supercomputer in Europe and #6 on the June 2019 TOP500 list (www.top500.org/lists/2019/06/), with a peak performance of 27.15 petaflop. The Cray XC50 has a total of 5704 nodes that are equipped with one 12-core Intel E5-2690 v3 Haswell CPU with 64 GB of memory and one NVIDIA Tesla P100 GPU per node, interconnected with the Cray Aries network. The simulations of this study did not use the GPUs for computations.
The simulations on Piz Daint for this paper were performed using a hybrid MPI/OpenMP configuration with either 4880 tasks with 12 threads per task or 9776 tasks and 6 threads per task, utilizing 4880 nodes or 4888 nodes, respectively. The two configurations produced similar performance (see Table 1). The performance results that are presented in the following do not consider model initialisation and focus solely on the resources used during model timesteps. Figure 2 is presents the cost distribution of the different model components for simulations using the IFS at the grid spacing that is used for routine weather forecasts at ECMWF (9 km), as well as simulations with 1.45 km grid spacing in hydrostatic and nonhydrostatic mode. The use of single precision will not change the cost fraction significantly, as long as I/O and the Nucleus for European Modeling of the Ocean (NEMO) model are switched off. The relative cost for spectral transforms are higher for the non-hydrostatic configuration, because additional transformations between spectral and grid-point spaces are required . The hydrostatic simulation is using a finite-element discretization for the vertical and no predictor-corrector scheme (similar to H-FE-120DT in the next section and the operational setting at ECMWF), whereas the non-hydrostatic simulation is using a finite-difference discretization for the vertical direction and one iteration of the predictor-corrector scheme. Figure 3 presents the scaling behavior of the IFS on Piz Daint for simulations with 1.45 km grid spacing, and Table 1 provides information about the simulations. The hydrostatic configurations are significantly less expensive in comparison with non-hydrostatic simulations. Both model configurations exhibited a reasonable (strong) scaling behavior when using most of the available nodes on the supercomputer, in particular, for the non-hydrostatic case. However, the data is limited, and the comparison of hydrostatic and non-hydrostatic configurations indicates that the run with significantly shorter elapsed time per timestep appears to be effected by latency within the global communications of the transforms (not shown). Nevertheless, the efficiency of simulations is reaching 0.19 simulated years per day (SYPD) of computation for the hydrostatic model. Allowing operational weather and climate simulations would require a throughput of approximately one forecast year per day of computation (obviously also with I/O switched on and with ocean and wave models coupled). While the performance results of this paper are promising, it should be noted that simulations using   the IFS with a shorter timestep size, two or three predictor-corrector iterations, and 137 vertical levels, as compared and presented in Section 4, will naturally increase computational cost significantly. These simulations scale in the same way, but at higher overall time-to-solution.
The model configuration that is closest for a comparison of performance are the COSMO simulations from Fuhrer et al. (2018) which have documented 0.043 (0.23) SYPD for 930 m (1.9 km), with nearglobal simulations of the COSMO model scaling to nearly 4888 GPU-accelerated nodes on Piz Daint. Schulthess et al. (2019) conclude that there is a shortfall factor of 101× for the COSMO model and a shortfall factor of 247× for the non-hydrostatic IFS model with a projected 30 s timestep to reach global simulations at 1 km resolution with a throughput of 1 SYPD when running both models on Piz Daint. This does not necessarily indicate that the COSMO model is more efficient since this comparison penalizes IFS for using a larger timestep and not using the GPU resources on each node. Under this caveat, we conclude that the IFS simulations presented in this paper are competitive compared with other models. We also list energy consumption figures in Table 1 for our IFS simulations, as reported by Piz Daint, which will be useful for future reference, since energy-to-solution is an emerging measure of efficiency for Earth System models. Here, we measure in units of actually consumed MegaWatt hours (MWh) per simulated year (SY).

Scientific evaluation of selected simulations
In this section, we compare model fidelity between different model simulations at 1.45 km grid spacing. To identify the impact of different options for the model configuration, we mainly compare six different model runs described in Table 2.
Unfortunately, we could not run the non-hydrostatic simulation with 30 s timesteps and real-world topography as it became unstable. However, this instability could be removed using a more strongly filtered version of the orography (not shown here) or no orography (see notopo-NH-FD-DT30). Furthermore, we anticipate that changes to the non-hydrostatic configuration that are currently implemented will help remove these instabilities .
The section will present results for the global spectra of horizontal kinetic energy (Section 4.1), probability density functions (PDFs), spectra and snapshots of vertical velocity (Section 4.2), PDFs of precipitation (Section 4.3), plots for satellite simulations in comparison with real satellite data (Section 4.4) and preliminary results for forecast errors of high-resolution simulations (Section 4.5).

Energy spectra
To get a first impression of model fidelity for the different model simulations, we have plotted the spectra for horizontal kinetic energy in Fig. 4. It should be noted that the spectra presented here are only snapshots and that the model is still not spun up completely after 12 h of simulations. The data to average over a longer time period is not available for these simulations. However, we do not expect the qualitative differences between our simulations to change significantly.
The energy spectra show a spurious increase in energy for the NH-FD-DT60 configuration at small scales which is consistent with the instability that we experienced when using a 30 s timestep for the same model configuration. The energy level is slightly higher for notopo-NH-FD-DT30 in comparison with the other simulations at 200 hPa at small scales. The figure is also showing the spectra of a global simulation with 9 km grid spacing for a comparison that Table 2. Properties of the simulations that are evaluated in Section 4. The table provides the identifier that is used for each run in the rest of the paper, information whether the run was hydrostatic or non-hydrostatic and whether it was using realworld or flat topography, the number of vertical levels, the vertical discretization method, the length of the timestep, and the number of predictor-corrector (PC) iterations. clearly fails to transition between the −3 and −5/3 scaling behaviors. One way to assess the realism of horizontal kinetic energy spectra is to identify where the impact of dissipative mechanisms at the tail of the spectrum becomes evident via a departure from the theoretical −5/3 curve. The such defined effective resolution, for which the kinetic energy spectrum is reducing in comparison with the expected scaling, is between 5 km and 10 km for the simulations at 1.45 km grid spacing. This is consistent with other measurements of effective resolution from spectra (cf. Abdalla et al. 2013), for example, Heinze et al. (2017) and Skamarock et al. (2014), who identify seven to eight times or six times the grid spacing, respectively.

Run Identifier
To make differences between the simulations more visible, we plot the horizontal kinetic energy spectra with a compensation for the −5/3 scaling in Fig. 5. IFS shows more deviations from the theoretical −5/3 curve at 200 hPa when compared with the results at 500 hPa. More recent comparisons with other models in the DYAMOND project would suggest that this is both a spin-up feature but also specific to IFS (not shown). Overall, the different spectra are similar but differences are visible. Consistent with the total spectra in Fig. 4, the NH-FD-DT60 exhibits spurious behavior at small scales. These features become less prominent if a less ambitious topography field with a coarser resolution is used (not shown here) and are small for the runs without topography (notopo-NH-FD-DT30). However, the non-hydrostatic simulations with topography may be repeated in future in light of ongoing model developments . For the equivalent hydrostatic simulation (H-FD-DT60), there is no increase in the spectra visible for the small scales but the divergent part shows a small bump close to wavenumber 4,000 at 200 hPa. In contrast to the other simulations, the two H-FE simulations have less energy in the divergent part of the spectrum at 500 hPa when compared with the rotational part even for high wavenumbers. Notably, the vertical finite-element discretization is of higher order than the first-to second-order finite-difference discretization.

Vertical velocity
For further insight how the different model configurations represent vertical motions (and potentially convection), we have also plotted variance spectra of vertical velocity in Fig. 6. As expected, the simulations without topography (notopo-H-FD-DT30 and notopo-NH-FD-DT30) show differences in the spectra of vertical velocity also for large scales. Consistent with the energy spectra in Fig. 5, the two non-hydrostatic simulations (NH-FD-DT60 and notopo-NH-DT30) show a spurious increase in variance for small scales. There are clear differences visible between the simulations H-FE-DT120 and H-FE-DT60 indicating that the dynamics are not yet converged with the timestep. However, differences when changing the vertical resolution and the vertical discretization from finite element (H-FE-DT60) to finite difference (H-FD-DT60) are even larger.

DT30). NH-FD-DT60 is showing small-scale patterns
of vertical velocities reminiscent of spectral ringing that may also be caused by spurious gravity waves. This signal is consistent with the spurious pattern in the energy spectra that were visible in Fig. 5. Overall, differences between H-FD-DT60 and NH-FD-DT60 and between notopo-H-FD-DT30 and notopo-NH-FD-DT30 are rather small indicating that the difference between hydrostatic and non-hydrostatic simulations is smaller when compared with other changes in the model configuration. Figure 8 is comparing the probability distribution for vertical velocity for the four runs. Please note that it is not ideal to show only a single snapshot of the PDFs due to the short length of the simulation. While we do expect minor changes if results would be averaged over several independent timesteps, we do not expect qualitative differences in the results, since the number of global sampling points is still substantial at least compared with regional simulations.
Differences in the distribution of vertical velocity are clearly visible for the different simulations and the two vertical levels. It is difficult to relate the measured values to observations but vertical velocities of more than 50 m s −1 may be unrealistically high. However, the actual number of cells with such large vertical velocities is very small (note the logarithmic scale with the total number of sampling points being 256    Table 3 is listing the number of grid points with large vertical velocities over the entire globe. The simulations with finite-element discretization and higher resolution in the vertical direction (H-FE-DT60 and H-FE-DT120) show the highest up-ward velocities whereas the two non-hydrostatic simulations (NH-FD-DT60 and notopo-NH-FD-DT30) are showing stronger negative velocities. In contrast, the simulation with finite-difference discretization and hydrostatic equations (H-FD-DT60) is showing the smallest vertical velocities. The signal is qualitatively consistent if considered after 12 h and 24 h (see Table  3 for numbers for a subset of runs).

Precipitation
The shape and distribution of precipitation should change significantly as grid spacing is reduced from 9 to 1.45 km. Figure 9 presents the PDFs of total precipitation for different simulations. For the simulation with 9 km grid spacing and parametrized convection, the number of grid points with heavy precipitation is significantly reduced indicating an ability to improve the representation of local precipitation when resolution is increased. The simulation with 120 s timestep (H-FE-DT120) is also showing a lower number of high-precipitation events. The non-hydrostatic simulations show a lower number of events with very large precipitation when compared with the hydrostatic simulations. However, differences are of the same order of magnitude as the other changes of the model configuration, such as the timestep or vertical discretization.
Within IFS, total precipitation per grid column can have two sources: large-scale precipitation and convective precipitation. Large scale precipitation represents precipitation from resolved atmospheric motions whereas convective precipitation is motivated by convective updrafts within the grid columns that are not represented explicitly if parametrization for convection is switched on. For storm-resolving simulations at 1.45 km grid spacing, the parametrization of deep convection is switched off while the parametrisation of shallow convection is still enabled. We can therefore expect that convective precipitation will be reduced significantly for storm-resolving simulations and we would hope that large-scale precipitation would increase such that total precipitation is staying at the same level.
To test this hypothesis, Table 4 is presenting the averaged amount of precipitation over the entire globe Please note the logarithmic scale on the y-axis. The bin size when calculating the PDF was 0.1 mm, which results in the majority of grid points being in the first bin of less than 0.1 mm precipitation. show a significant reduction in convective precipitation. The large-scale precipitation does indeed buffer the reduction and the amount of total precipitation is in fact increased by approximately 10 -15 % when compared with the simulation with 9 km grid spacing.
Performing the same evaluation after 24 h does not change the conclusion (not all runs were simulated for the full 24 h). However, a further evaluation how the transition between convective and large-scale precipitation is happening when resolution is steadily increased for simulations with and without parametrized deep convection should be performed for future publications.

Satellite simulators
Figures 10, 11 and 12 present the results for the simulated satellite radiances for the different model runs. The plots were generated with the standard satellite simulator that is used at ECMWF which is based on RTTOV (Hocking et al. 2013). All runs produce a cloud pattern that is realistic in comparison with the satellite data. It is evident that the higher resolution is beneficial for the representation of clouds with explicit cellular organization absent in some of the convective areas for the simulation at 9 km grid spacing. However, the representation of low level clouds seems to fit better to the satellite data for the 9 km simulation when compared with the simulations at higher resolution (see bottom left of Fig. 11). This indicates that the simulations at high resolution may require changes to the parametrization schemes, including changes to the parametrization of shallow convection and the cloud microphysics, but also their interaction with the boundary layer turbulent diffusion, e.g., Duran et al. (2018).
Consistent with the discussion of vertical velocity, the two simulations with finite-element discretization in the vertical direction (H-FE-DT120 and H-FE-DT60) appear to be too pop-corny with rather large convective cells. The differences between hydrostatic and non-hydrostatic equations are again rather small in comparison.

Forecast errors
We have also calculated forecast errors for the headline scores of geopotential height at 500 hPa and temperature at 850 hPa. The two simulations without topography are not considered here. We compare the results against the operational forecast configuration. The forecast error was calculated on a TCo639 octahedral reduced Gaussian grid with 18 km grid spacing for all simulations. Please note that the forecast error for the operational forecast was calculated against the operational analysis whereas the other errors are calculated against the long-window analysis to allow for consistency with initial conditions. While these global forecast errors were calculated from a single forecast which does not provide a satisfying level of statistics, it is still evident that an increase in horizontal resolution does not necessarily lead to a reduction in forecast error for a single forecast. In contrast, the simulations with explicitly simulated deep convection show an increased forecast error. Interestingly, this behavior is not observed in FV3 when comparing simulations at 3.25 km and 13 km grid spacing (S. J. Lin, personal communication).

Discussion of model realism and design choices
The six model simulations with 1.45 km grid spacing that were evaluated in the previous section provide some corner points with their choices for the length of the timestep, number of iterations in the predictorcorrector scheme, and equations.
The non-hydrostatic simulations are showing some spurious behaviors for energy spectra (Fig. 5) and vertical velocity (Fig. 7). The hydrostatic simulation that was using a timestep of 120 s did not show a spurious behavior. However, the results are also different between the H-FE-DT120 and H-FE-DT60 simulation and this indicates that a timestep size of 120 s violates some time resolution aspects of either cloud/ precipitation processes at vertical wind speeds typical for convective cells or increased trajectory crossings within the semi-Lagrangian advection scheme itself. There is also an indication of too cold top-of-theatmosphere brightness temperatures in the presence of deep convection (see Fig. 11).
The simulations of this paper are entering the resolution range for which differences between hydrostatic and non-hydrostatic equations can be expected (Jeevanjee 2017). For our simulations, H-FD-DT60 and NH-FD-DT60 as well as notopo-H-FD-DT30 and notopo-NH-DT30 show similar results, except for the spurious behavior of the spectra of the non-hydrostatic simulations at small scales (Fig. 6). Furthermore, to the authors best knowledge, the IFS simulations that were performed for the DYAMOND project at 4 km resolution showed no significant degradation in the results in comparison with the other participating models-that were all non-hydrostatic-even at lead better forecast scores in comparison with an ensemble of H-FE-DT60 simulations at the same computational cost. In the same way as we propose reduced precision simulations, algorithmic choices that enhance the time-or energy-to-solution need to be fairly assessed.
The results of this paper show that forecast errors for Z500 and t850 are higher in comparison with the operational resolution for deterministic forecasts and that both parametrization schemes and dynamical core options will require further testing and adjustments to achieve optimal results. However, these results should not be over-interpreted since it is known from previous resolution upgrades at ECMWF indicated that continuous efforts in improving the parametrizations for a given model resolution improve forecast scores. In any case, it is evident that storm-resolving simulations using the IFS may still require significant work before improvements in forecast scores can be realized as the relative weight of different parametrization schemes shifts. This is visible in the amount of total precipitation which is approximately 10 -15 % higher for storm-resolving simulations (see Table 4). Furthermore, the explicit representation of convective cells will increase variability in the tropics. This may help to improve ensemble spread but may also reduce skill for deterministic forecasts. The increased variability might require an increase in the number of ensemble members for ensemble predictions. This generates additional pressure for the development of highly efficient models to allow for global, operational ensemble simulations that run at storm-resolving resolution in the future. Notably, we have also initialized from a lower-resolution analysis which leaves many degrees of freedom uninitialized and the problem of a global 1.45 km analysis is still formidable.
All simulations, except one, show vertical velocities that appear to be unrealistically large for a small number of grid cells (Fig. 8). This will require more detailed studies to disentangle the impact of microphysical processes and numerical choices. A more detailed evaluation of model fidelity for hydrostatic and non-hydrostatic simulations as well as different dynamical core choices and timesteps (including simulations with less then 30 s) will be performed in future studies.
The large timesteps knowingly violate the time resolution required for some of the cloud related processes. However, given the logarithmic distribution of PDFs of precipitation and vertical velocity, it will be interesting to see in the future if for example the simulated climate is sensitive to this, or if this violation is acceptable if measured in climate or ensemble statistics.
Nevertheless, it is promising that all simulations are showing significant differences in the horizontal kinetic energy distribution even at scales of several hundred kilometers when comparing spectra at 9 km and 1.45 km grid spacing, and this structural difference is also observed in experiments that assess the impact of physical parametrization on energy spectra and on nonlinear spectral energy fluxes (Malardel et al. 2016).

Conclusion
In this paper, we document simulations using the IFS that are running with a horizontal grid spacing of 1.45 km from real-world initial conditions and with real-world topography on the fastest supercomputer Fig. 13. Mean absolute error averaged over the globe plotted against forecast lead time that was calculated against analysis products for geopotential height at 500 hPa (Z500) and temperature at 850 hPa (t850) for a single forecast with different model configurations. The simulation with 9 km grid spacing is the operational forecast at ECMWF (H, FE, 137 vertical levels, Δt = 450 s, 0 PC, coupled to NEMO and the wave model).
in Europe. The results confirm that global stormresolving simulations are possible today. A simulation that scales to almost the entire size of the fastest supercomputer in Europe can achieve 0.19 SYPD of computation (based on the H-FE-DT120 configuration with 62 vertical levels). However, these simulations are generating only limited model output, are uncoupled, may require smaller timestep or non-hydrostatic adjustments, and would still be too slow to allow for operational weather and climate predictions that would require a throughput of at least 1 SYPD. The IFS is performing reasonably well on the limited number of nodes on Piz Daint at 1.45 km and we expect a linear performance scaling if the number of CPUs per node would be increased. Given the skepticism of the community regarding the usefulness of spectral models for simulations at high resolution due to the bad scaling behavior of the Legendre transformation, it is good news that the spectral IFS model is achieving throughput numbers that are competitive with grid-point models that are based on explicit timestepping schemes. Given the results of this paper and the high efficiency of the IFS in comparison with other global models at slightly lower resolution (Michalakes et al. 2015), we argue that spectral discretization combined with semi-implicit semi-Lagrangian time stepping schemes will remain highly competitive towards global storm-resolving simulations in the future. The use of half-precision floatingpoint arithmetic and hardware accelerators that were designed for deep learning may provide an additional speed-up for Legendre transformations (Hatfield et al. 2019). This would, however, require further testing, in particular for simulations with high resolution.
We have presented figures of simulated satellite radiances, topography and model spectra that show improvements in realism and added value for global storm-resolving simulations. It is often argued that global storm-resolving model simulations are already able to pass the Turing test (suggested by Palmer (2016)). This test is passed if it is not possible to distinguish between satellite observations and model simulations when looking at cloud fields. We claim that the simulations of this paper pass the Turing test since a simple change in the color scale in Fig. 10 would generate bigger differences than the differences that are visible between simulations and satellite observations. However, the results of this paper also show that differences between the real world and high-resolution simulations and differences between high-resolution simulations with different configurations are still significant and that it will still require significant work to find the optimal model configuration for stormresolving models and to beat deterministic forecast scores of the current generation of weather models in operations.
The challenges that Exascale supercomputing will bring to the domain of Earth System modeling and the likelihood that this will allow global storm-resolving simulations for operational weather and climate predictions have recently been outlined in several papers (see, e.g., Lawrence et al. 2018;Neumann et al. 2019;Schulthess et al. 2019;Schäer et al. 2019;Biercamp et al. 2019). While the results of this paper confirm that these simulations could be within reach soon, there is no question that it will require a large concerted European (or global) effort between modeling and supercomputing centeres to face the significant challenges (adaptation to accelerators and heterogeneous hardware, the data avalanche (Balaji et al. 2018), energy cost, etc.) to make global storm-resolving weather & climate modeling affordable and environmentally acceptable.