Intercomparison of Cloud Properties in DYAMOND Simulations over the Atlantic Ocean

We intercompared the cloud properties of the DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains (DYAMOND) simulation output over the Atlantic Ocean. The domain averaged outgoing longwave radiation (OLR) is relatively similar across the models, but the net shortwave radiation at the top of the atmosphere (NSR) shows large differences among the models. The models capture the triple modes of cloud systems corresponding to shallow, congestus, and high clouds, although their partition in these three categories is strongly model dependent. The simulated height of the shallow and congestus peaks is more robust than the peak of high clouds, whereas cloud water content exhibits larger intermodel differences than does cloud ice content. Furthermore, we investigated the resolution dependency of the vertical profiles of clouds for NICAM (Nonhy drostatic ICosahedral Atmospheric Model), ICON (Icosahedral Nonhydrostatic), and IFS (Integrated Forecasting System). We found that the averaged mixing ratio of ice clouds consistently increased with finer grid spacing. Such a consistent signal is not apparent for the mixing ratio of liquid clouds for shallow and congestus clouds. The impact of the grid spacing on OLR is smaller than on NSR and also much smaller than the intermodel differences.


Introduction
General circulation models (GCMs) are used to simulate cloud and precipitation systems and investigate their change resulting from global warming scenarios. The representation of realistic cloud fields is one of the important unsolved issues of GCM simulations (e.g., Schneider et al. 2017;Zelinka et al. 2017). Clouds have a strong impact on the radiation budget but their representation, due to the coarse grid spacing of conventional GCMs, has to rely on uncertain parameterizations. This especially concerns the representation of cloud and precipitation systems associated with convection. These subgrid-scale convective systems can be explicitly represented using global models via fine-enough horizontal resolution. With a grid spacing of less than approximately 5 km, at least mesoscale (convective) storms can be explicitly resolved. Such models can be run globally and are called global storm-resolving models (GSRMs, Satoh et al. 2019).
GSRMs have several merits. Convective systems can be simulated by physically consistent cloud microphysics schemes without the supplementary need for a convective parameterization. The probability distribution function of precipitation rates becomes closer to observed probabilities with an increase in the frequency of extreme precipitation and a decrease in the frequency of light precipitation as compared with GCMs (e.g., Kodama et al. 2015;Prein et al. 2020). The diurnal cycle of precipitation over the tropics and midlatitudes is also well captured (e.g., Prein et al. 2015;Sato et al. 2008). However, even if GSRMs bypass the convective parameterization deadlock by brute force, they still need to rely on their cloud microphysics schemes to produce clouds and precipitation, with their inherent uncertainties. Because the horizontal grid spacing of GSRMs is comparable with satellite observations, such observations can be more directly compared with GSRMs than to GCMs, and more directly used to improve cloud microphysics schemes. This simplifies the interpretation of cloud evaluation in GSRMs (Masunaga et al. 2008;Hashino et al. 2016), also with the help of satellite simulators (Roh and Satoh 2014;Roh et al. 2020;Seiki and Roh 2020).
One of the ways to understand and improve the performance of GSRMs is to perform an intercomparison study. The first intercomparison project for GSRMs was the DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains (DYAMOND, Stevens et al. 2019). In DYAMOND, nine GSRMs were run for 40 days starting 1st August 2016, using pre-described sea surface temperature.
In this study, we focus on clouds over the tropical Atlantic as simulated by seven of the nine DYAMOND models. Over the tropical Atlantic, shallow clouds are dominant in the trade wind region and deeper convective systems co-exist with cirrus clouds in the ITCZ. It is known that the tropical Atlantic is also largely influenced by the Hadley-Walker circulation (e.g., Wang 2004). Our aim is for the first time to assess the representation of cloud properties in several GSRMs integrated under the same constrained conditions. We also examined the resolution dependency of cloud properties for three DYAMOND models that were run at various resolutions. We evaluated the cloud profiles using satellite data with satellite simulators.

Models and observation data
The DYAMOND models were initialized on 1st August 2016 with the same ECMWF 9 km reanalysis data. The total integration time was 40 days. Nine models participated in DYAMOND: the global nonhydrostatic model of Météo-France (ARPEGE-NH, Bubnová et al. 1995 Khairoutdinov and Randall 2003); and the Unified Model (UM, Wood et al. 2014).
Four models did not use a convective parameterization (ARPEGE-NH, ICON, NICAM, and SAM). The simulations, except for the UM simulation, were conducted with less than 5 km horizontal grid spacing. The NICAM, IFS, and ICON simulations were also rerun with coarser resolutions. The detailed descriptions of the model configurations are given in Stevens et al. (2019), and a more detailed description of the ICON simulations can be found in Hohenegger et al. (2020). For our analysis, we did not include UM, as it used a grid spacing coarser than 5 km, and ARPEGE-NH. In the case of ARPEGE-NH, we had issues with data handling. We thought that the results of the seven models are sufficient to show the variation of vertical profiles of cloud ice, cloud water, and cloud fraction.
For validation, we used daily outgoing longwave radiation (OLR) and net shortwave radiation at the top of the atmosphere (NSR) from the Clouds and the Earth's Radiant Energy System (CERES; Wielicki et al. 1996) available at one-degree grid spacing. We used the merged CloudSat/CALIPSO data (DarDar data, Delanoë and Hogan 2010) to derive cloud properties. The 94 GHz radar reflectivity, 532 nm total attenuated backscatters, and simplified cloud mask (Delanoë and Hogan 2010) were used as observation data in the DarDar data.
We focused on a smaller region of the tropical Atlantic ocean covering the area 30 -50°W and 0 -20°N. We restricted our analysis to a 10 day time period, from 11th to 20th August 2016. The features and differences between the simulations that we will highlight are robust and can actually even been seen on a single day basis.
We also evaluated the cloud fractions of NICAM using CloudSat and CALIPSO signals through the Joint simulator for satellite sensors (Hashino et al. 2013) in Section 5. The sensor simulator was the Earth CARE Active Sensor Simulator (EASE, Okamoto et al. 2007Okamoto et al. , 2008Nishizawa et al. 2008).  (Table 1). FV3 overestimates OLR, whereas the other models underestimate it. Figure 2 shows daily NSR at the top of the atmosphere for the same seven models. The CERES observation shows the decrease of NSR over regions with shallow and deep convection and the low NSR from the Sun's declination in the southern Atlantic Ocean. The models reproduce these features. The simulated domain averaged NSR is within 30 W m −2 from the observation (Table 1). Hence, the differences in NSR are larger than in OLRs. This is also true when looking at the inter-model differences. This is consistent with the results of Stevens et al. (2019) when looking over the whole tropics and the 40 days. NICAM overestimates NSR compared with the observation because of the lack of shallow clouds (see also Fig. 3). ICON and GEOS underestimate NSR compared with the observation. The better agreement in OLR than in NSR is linked to the fact that OLR is mostly affected by high clouds. The high clouds from the large convective systems can be represented with a grid spacing of a few kilometers. By contrast, the shallow clouds are parameterized by boundary layer schemes and cumulus parameterization and remain underresolved at such scales. Figure 3 shows the vertical profiles of the domainaveraged mixing ratios of cloud water and cloud ice for the 10 days. All simulations reproduce the triple mode of cloud systems, i.e., shallow (below 4 km altitude), congestus (within 4 km and 8 km altitude), and high clouds (above 8 km altitude). The peaks in cloud water lie between 1.5 km and 2 km height for shallow clouds in the investigated DYAMOND models. For the congestus clouds, six out of the seven models simulate the peak within 4 km and 5 km altitude; only IFS simulates the peak near 6 km. The fact that IFS uses a bulk mass flux scheme (Bechtold et al. 2014;Tiedtke 1993) for shallow, congestus, and deep convection might explain this discrepancy. In terms of amount, all the models seem to agree to a first order with values comprised between 0.005 g kg −1 and 0.02 g kg −1 . Only ICON appears as an outliner and simulates an almost three times larger cloud water content. This is consistent with the lowest NSR value displayed by ICON in Table 1.

Intercomparison of clouds
The distribution of cloud ice shows a high-cloud peak. Its height displays large discrepancies across the models, discrepancies that are larger than the ones found for the shallow and congestus clouds. NICAM shows a peak near 14 km, whereas SAM and GEOS exhibit much lower peaks, near 8 km. Integrated over the vertical column, SAM and GEOS also exhibit larger cloud ice amounts, with values of 0.008 against 0.005 in the other models except NICAM. The fact that SAM and GEOS include the precipitating ice categories like graupel and snow in the cloud ice category may largely explain their larger ice amount and lower peak. MPAS reproduced the smallest mixing ratio of cloud ice among the seven models. The microphysics scheme (Thompson et al. 2004) of MPAS has very little cloud ice but a lot of snow. Figure 4 shows the vertical profiles of cloud fractions. For the observations, the simplified cloud mask (Delanoë and Hogan 2010) from the merged data from CALIPSO and CloudSat was used as a proxy for the cloud fraction. In the simulations, we defined clouds as grid points with a mixing ratio of cloud water plus cloud ice larger than 1 mg kg −1 . Note that most participating models except NICAM and SAM use cloud   cover schemes of varying complexity. The cloud cover consistent with each model's formulation was not available within the DYAMOND dataset. For simplicity, we therefore defined clouds as grid points with a mixing ratio of cloud water plus cloud ice larger than 1 mg kg −1 . The observation shows the triple modes of the cloud population with peaks around 1, 5, and 13 km. The shallow and deep clouds are dominant compared with the congestus clouds in the observations. Although the simulations reproduce the three modes of clouds at similar altitudes for the shallow and congestus population, the partitioning is different across models. For example, NICAM underestimates the shallow clouds compared with the observation and with the other models. ICON reproduces a higher fraction of shallow clouds compared with the observation and NICAM. The highest fraction of congestus clouds is to be found in IFS and ICON, whereas NICAM and GEOS exhibit the highest fraction of high clouds. Hohenegger et al. (2020) showed that the partitioning of clouds between shallow, congestus, and deep can also strongly vary within the same model when changing the horizontal resolution.

Resolution dependency
The characteristics of resolution dependencies of cloud properties are among the interesting issues. We investigated the resolution dependency of domain averaged mixing ratio of cloud water and cloud ice using the three models NICAM, ICON,and IFS (Figs. 5,6), where simulations at different resolutions were performed. ICON reduces the mixing ratio of cloud water with finer grid spacing (Fig. 5b), as also noted in Hohenegger et al. (2020). This reduction is particularly visible in the lowest 2 km. Such a reduction with grid spacing is also visible in NICAM and IFS. In NICAM, it is nevertheless less pronounced, whereas in IFS, the reduction happens between 4 km and 8 km altitudes. We speculated that the fact that IFS uses a convective parameterization, in contrast to ICON and NICAM, explains this distinct behavior. The combination of a finer grid spacing and a convective parameterization may facilitate a fast transition from congestus to deep clouds. The resolution-induced differences nevertheless remain smaller than differences previously noted across models (compare with Fig.  3a). By contrast, the resolution dependency of the mixing ratio of cloud ice shows very consistent results    across the three models: the mixing ratio of high clouds and the height of the maximum mixing ratio increase with the finer grid spacing (Fig. 6). Such a consistent resolution dependency is again not seen in the cloud fraction. The cloud fraction associated with high clouds slightly increases in NICAM and ICON in higher resolution experiments (Fig. 7). However, this fraction was reduced when going from the 9 km to the 4 km IFS experiment. To get more insight, we investigated the frequencies of ice water path (IWP) divided by the resolution factor between NICAM and IFS (Fig. 8). We only divided the frequencies of IWP in 4 km IFS and 3.5 km NICAM by factor 4. Both models increased the cloud frequencies for extreme cases with larger IWP than 20 kg m −2 (NICAM) and 2 kg m −2 (IFS) in the finer grid spacing. There is a clear difference in the clouds less than 2 kg m −2 of IWP between NICAM and IFS. NICAM increased cloud frequencies with the IWP less than 2 kg m −2 , but IFS reduced the cloud frequencies with the IWP within 10 g m −2 and 2 kg m −2 in the finder grid spacing. The different pattern of cloud frequencies within 10 g m −2 and 2 kg m −2 in IWP leads to the opposite impact on the ice cloud fraction depending on the grid spacing between NICAM and IFS (Figs. 7a,c). And the increase in cloud frequencies larger than 2 kg m −2 made the increase of domain averaged mixing ratio of cloud ice in IFS (Fig. 6c).
We also investigated the resolution dependency of domain averaged OLR and NSR (Table 2). Similar to the model sensitivity, the resolution dependency is much larger for NSR than for OLR. In terms of OLR, its sensitivity to resolution is even smaller than the intermodel differences. The domain averaged OLR in NICAM and ICON decreased by almost 2 W m −2 or 3 W m −2 with finer grid spacing. By contrast, the OLR of IFS increased by 4.6 W m −2 , which is related to cloud fraction and mixing ratio (Figs. 6, 7). The NSR increases with finer grid spacing due to the reduction of the shallow cloud fraction and the mean mixing ratio of cloud water. The NSR from ICON 10 km to ICON 5 km as well as IFS 9 km to IFS 4 km shows a significant reduction. ICON decreased the domain averaged mixing ratio of cloud water in Fig. 5b; conversely, IFS decreased the cloud fraction of shallow clouds (Fig. 7c). NICAM did not show a distinct reduction of the NSR between two resolutions.

Evaluation using a satellite simulator
The choice of the mixing ratio threshold to define the cloud fraction affects the resulting vertical profile of cloud fraction, making the comparison of observations, as in Fig. 4. For instance, we tested three different thresholds in NICAM in Fig. 9. Especially, the cloud fraction above 5 km shows a large sensitivity to the chosen threshold. This means that the definition of a cloud is important for the evaluation of the vertical structure of clouds.
Furthermore, when we compare the mixing ratio of hydrometeors of each model, even two models with the same mixing ratio can exhibit different radiative properties because of the different size distributions and microphysical assumptions of each model. One of the methods to avoid uncertainties stemming from microphysical assumptions and threshold setting and to compare the same physical variables in observations and models is the application of a satellite simulator. Such a simulator uses the same setting of microphysical properties in the specific GSRMs and calculates the radiances such as those observed in satellite signals (Masunaga et al. 2010;Hashino et al. 2013;Matsui et al. 2014). It is thus possible to evaluate the simulations using the observation directly.
Unfortunately, the standard output of the DYA-MOND models is not enough to run a satellite simulator. The reason is that not all the hydrometeor classes were outputted. We thus reran tests only for NICAM with the necessary supplementary output. We used the merged brightness temperature and CALIPSO/Cloud Sat signals for the evaluation.
The CERES data have limitations to detect smallsize clouds as well as the temporal variations of convective systems. It is thus necessary to use the merged 11 µm brightness temperatures from the National Centers for Environmental Prediction/Climate Prediction Center (Janowiak et al. 2001) instead of the CERES data, which provides us with a horizontal resolution finer than 5 km and a temporal resolution of 30 min. We compared the resulting horizontal distributions of brightness temperature in Fig. 10. A convective band is visible in the observed brightness temperature, and NICAM also reproduces a convection near the equator. Shallow clouds with high brightness temperature were found in the northern part of this analysis domain, although they are misplaced between obser-   vations and NICAM.
We evaluated the vertical profiles of cloud fractions using CloudSat and CALIPSO (Fig. 11). We first compared the cloud fraction between the CloudSat cloud radar and NICAM using the same threshold of radar reflectivity of −25 dBZ (Fig. 11a). The cloud radar is more sensitive to precipitating ice and rain than the lidar of CALIPSO, whose comparison to NICAM is shown in Fig. 11b. The cloud fraction observed by the cloud radar of CloudSat (Fig. 11a) is similar to the cloud fraction from the simplified cloud mask from CALIPSO (DarDar), previously shown in Fig. 4, and to CALIPSO (Fig. 11b). The cloud fraction of Cloud-Sat above 14 km is underestimated compared with the merged cloud mask of the DarDar data. The previously noted overestimation of high clouds in NICAM and underestimation of shallow clouds remain visible when comparing NICAM to CloudSat using the satellite simulator. However, there is a mismatch of cloud fractions between 2 km and 10 km related to precipitating hydrometeors such as snow and graupel. CloudSat also shows surface clutters (e.g., Marchand et al. 2008), which is not apparent in NICAM.
We compared the cloud fractions between the lidar of CALIPSO and NICAM using the same threshold of 532 nm for the total attenuated backscatter. The CALIPSO can detect both aerosols and optically thin cirrus clouds. We used the threshold of 5 × 10 −3 m −1 sr −1 for the cloud detection by the lidar to remove the signals from aerosols. This threshold is a relatively large backscatter for clouds and will thus predominantly detect optically thick ice clouds and water clouds. CALIPSO shows a high fraction of shallow clouds and of congestus clouds with liquid phases of clouds. NICAM underestimates both shallow and congestus cloud fractions when compared with  CALIPSO. The underestimation of shallow clouds is consistent with the underestimation of NSR than the CERES. One of the reasons for the underestimation of liquid water content in NICAM is a too fast transition from cloud water to the rain category in the microphysics scheme.

Summary and conclusion
In this study, we compared the cloud properties of DYAMOND model simulations over the Atlantic Ocean for 10 days. Shallow convective clouds were dominant over the analysis domain with deep convection and high clouds in the center of the domain. The domain averaged OLR was found to be more similar across the models than the domain averaged NSR. The vertical structure of cloud water, cloud ice, and cloud fraction exhibited large variations across models. Noteworthily, all models exhibited tree peaks in their vertical structure, as expected from the presence of shallow, congestus, and deep clouds, but the altitude of the peak of the deep clouds was especially model dependent. By contrast, intermodel variations in cloud water content are larger than in cloud ice content, which is in agreement with the larger differences observed in NSR than in OLR.
We examined the resolution dependency of vertical profiles of clouds among NICAM, ICON, and IFS. The resolution-induced differences are generally smaller than the intermodel differences. All the three models increased the mixing ratio and the height of maximum mixing ratio of cloud ice with finer horizontal resolution. Interestingly, the mixing ratio of cloud water for the shallow and for the congestus clouds shows distinct sensitivities to resolution depending on the investigated model. ICON reduced the mixing ratio of cloud water in shallow clouds and slightly increased cloud water in congestus clouds with finer grid spacing. By contrast, IFS reduced the mixing ratio of cloud water in congestus clouds with no change of mixing ratio in shallow clouds. We found that IFS also showed a distinct response of the cloud fraction of high clouds to changes in grid spacing compared with NICAM and ICON. The cloud fraction of high clouds decreased in IFS and slightly increased in NICAM and ICON with finer grid spacing. We speculated that especially the use of a convective parameterization in IFS, contrary to NICAM and ICON, might explain these distinct sensitivities to resolution.
The comparison of cloud fraction across models and with observations was affected by the threshold chosen for the cloud detection and by the definition of hydrometers. The same criteria must be used among models and observations. We suggested that one of the ways to use the same criteria is by applying a satellite simulator. However, more variables and information would be needed to run the Joint simulator on the DYAMOND simulation output, except for NICAM where this information is available. We applied such a simulator on the NICAM output and evaluated the results against CloudSat and CALIPSO. We found that NICAM underestimates the cloud fraction of shallow and congestus clouds compared with satellite data.
In the future, the reasons for the noted different characteristics of cloud properties must be investigated by conducting sensitivity tests of parameterization such as microphysics schemes, and the vertical structure of clouds and precipitation should be evaluated using observations.

Data Availability Statement
The data analysis files are available in J-STAGE Data. https://doi.org/10.34474/data.jmsj.16707256.