A 59-year ( 1948 ‒ 2006 ) global near-surface meteorological data set for land surface models . Part I : Development of daily forcing and assessment of precipitation intensity

This paper describes the development and assessment of global 0.5° near-surface atmospheric data from 1948 to 2006 at daily (for precipitation, snowfall, and specific humidity) to 3-hourly (for temperature, shortwave radiation, and longwave radiation) time scales, which can be used to drive land surface models. Using newly available monthly precipitation and temperature data extending to recent years, the variables were created by statistical methods, the parameters of which were obtained from available daily to 3-hourly observations. The daily precipitation developed in this paper produces reasonable numbers of precipitation days and heavy precipitation days, different from previous longterm meteorological data sets based on reanalysis. Together with its relatively high spatial resolution (0.5°) and availability of recent years, the newly obtained data may be preferred to other forcing data sets in case of hydrological and climate change studies, in particular if the study results are sensitive to daily variations in atmospheric conditions.


INTRODUCTION
Long-term variations in terrestrial water and energy budgets are essential for understanding the global environmental system, especially in the face of potential climate change.These variables are often estimated by land surface models (LSMs) driven in an off-line mode with atmospheric forcing data due to the limitation of direct observations.To drive LSMs, previous studies have created several decadal time series of forcing data, including precipitation, temperature, humidity, and radiation, at daily to several-hourly timescales.Most of these products have been based on reanalysis data such as those provided by the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) or the European Centre for Mediumrange Weather Forecasts (ECMWF).Since reanalysis data sets contain errors in model-simulated atmospheric forcings, the products based on reanalysis data are usually corrected against globally available observations.For example, Ngo-Duc et al. (2005) estimated 53 years (1948 2000) of 6-hourly forcing data from NCEP/ NCAR reanalysis data with correction of precipitation and radiation.They scaled the monthly precipitation amount to fit the monthly precipitation product by the Climate Research Unit (CRU) and scaled the monthly mean longwave and shortwave radiations to fit the those of the Surface Radiation Budget (SRB) project.Sheffield et al. (2006) and Qian et al. (2006) followed frameworks similar to that of Ngo-Duc et al. (2005).Berg et al. (2005) obtained 15-year (1979 1993) 6-hourly forcing data from ECMWF reanalysis data by scaling temperature, dew point temperature, precipitation, and long-and shortwave radiations to the monthly observations of those variables.
Although the above studies scaled the variables based on monthly observations, atmospheric forcings based on reanalysis products still contain some specific biases at shorter timescales such as daily precipitation intensity and number of precipitation days.The hydrological processes over the land surface, such as interception by leaves, water infiltration into the soil, and saturation excess runoff, are sensitive to daily precipitation values even if the total monthly precipitation amount is the same (e.g., Sheffield et al., 2004;Hirabayashi et al., 2005).Therefore, creating forcing data sets in which the daily statistics are similar to those of observations is important.
Several large-scale atmospheric forcing data sets were developed without using reanalysis products.For example, Nijssen et al. (2000) estimated 14-year global atmospheric forcing data with 2°horizontal resolutions.However, only limited years of forcing can be created using their methodology, because daily observations are required.Hirabayashi et al. (2005Hirabayashi et al. ( ) estimated 1°× 1°g lobal 100-year (1901Hirabayashi et al. ( 2000) ) atmospheric forcing data by combining equations similar to those of Nijssen et al. (2000).However, they extended the data period by using a stochastic weather generator to statistically create daily atmospheric forcing from monthly precipitation and temperature observations by the CRU (Mitchell and Jones, 2005), applying statistical parameters derived from available daily or 3-hourly observations.The goal of this study was to create a 59-year (1948 2006) near-surface meteorological data set (hereafter called as H08) with daily to 3-hourly timescales.H08 represents an improvement of the product by Hirabayashi et al. (2005).The enhancements in the methodology include 1) finer (0.5°) spatial resolution using new global gridded monthly observation product of precipitation and temperature, and gridded daily precipitation products over India and East Asia; 2) new estimations of the statistical parameters of a stochastic weather generator from new global daily to 3-hourly observation products; 3) improved methods for estimating dew point temperature and spatial distribution of daily precipitation; 4) correction of gauge undercatch of precipitation based on rain/snow phase detection; and 5) data for 2001 2006, with the ability to extend to future years.
H08 was statistically created from monthly observations of precipitation and temperature using daily statistics obtained from daily observations of precipitation, maximum and minimum temperature, and shortwave radiation.An expected advantage of these newly created data is that they should contain statistical characteristics similar to observations.This paper describes the overall process to create H08 and comparison of precipitation data with other published data.Comparisons of daily statistics of temperature and shortwave radiation with other data sets and impact of the gauge correction to estimate snowfall amount will be included in a companion paper.

DATA AND METHODOLOGY
The global 0.5°near-surface meteorological data set for the period 1948 2006 was created based on the method of Hirabayashi et al. (2005).The method of Hirabayashi et al. (2005) enables estimation of daily to 3-hourly atmospheric forcing when monthly means of precipitation and temperature are available.The monthly mean of precipitation is statistically disaggregated into daily time steps.Using the obtained daily precipitation and monthly mean temperature, a stochastic weather generator creates maximum and minimum temperature and incoming shortwave radiation at land surface at daily time steps.From these daily variables, other meteorological data (specific humidity and longwave radiation) are obtained using an empirical equa-tion model.We obtained all the required parameters for the method from available observations at shorter timescales but over limited periods.Table I  obtained by the gamma distribution was distributed within a month, in the same order as the NCC or GTS.If H08 showed more precipitation days than indicated by NCC or GTS, the occurrences of precipitation days were obtained by a first-order Markov chain model (Gabriel and Neuman, 1962), and were randomly distributed.
Next, the obtained daily precipitation were replaced by two regional daily gauge-based precipitation products when and where they were available.The first product was a daily precipitation product over East Asia (EA; 5 60°N, 65 155°E) with 0.5°grid resolution for the 26-year period from 1978 to 2003, produced by Xie et al. (2007).The number of the gauge incorporated in the product over the region (1400 2000) is more than twice those used in the PREC/L.The second product was 1°daily precipitation data by the India Meteorological Department (IMD) from 1951 to 2000 (Goswami et al. 2006).The number of gauges used in the IMD is about 1600 before the 1980s and more than 500 even in the 2000s, which is much higher than that of PREC/L (50 350 after the 1970s).
During periods when EA or IMD data were unavailable, the monthly mean precipitation over those regions was scaled using the ratio of the monthly climatology.The ratio of the monthly climatology of EA to that of PREC/L was estimated by averaging the monthly precipitation from 1978 to 1990; more recent years were not included in this average because a relatively low number of gauges have been used for PREC/L since 1990.The ratio of the monthly climatology of IMD to that of PREC/L was obtained from averages of monthly precipitation from 1951 to 1990.
Finally, snowfall amount was estimated.Because gauge undercatch error is particularly large in case of snowfall, we distinguished solid precipitation using an equation for the wet-bulb temperature suggested by Yamazaki et al. (2001) and then corrected rainfall and snowfall amounts separately with the undercatch correction factor based on gauge types.The wind velocity data of ECMWF's 40-year reanalysis (ERA40) (Betts and Beljaars 2003) was used in the method.Wind data from 1988 to 1996 and from 1983 to 1986 were subjectively selected and used to estimate the correction factors for 1948 1956 and 2003 2006, respectively, assuming that the impact of interannual change of wind velocity was small.

Temperature and shortwave radiation
Daily temperature and shortwave radiation were created by Richardson's (1981) stochastic weather generator.The parameters of the stochastic weather generator, the means and standard deviations of the maximum and minimum temperature, and shortwave radiation, were separately obtained for wet and dry days from daily precipitation, maximum and minimum temperature from GTS from 1986 to 1995, and daily shortwave radiation product of the Surface Radiation Budget (SRB) project (Release 2.8, Gupta et al. 2000) (http://eosweb.larc.nasa.gov/)from 1984 to 2004.
The estimated daily maximum and minimum temperatures were scaled using monthly mean temperatures of Fan and van den Dool (2008) which was created from a higher number of gauges (4000 8000 stations) than previous similar products, and monthly means of daily temperature range provided by CRU (1948 2002) and GTS (after 2002).The 3-hourly temperature was estimated by fitting a sine curve to the daily maximum and minimum temperature.Fan and van den Dool (2008) used a least squares distance weighting method to interpolate station data to grid cell, including an anomaly interpolation approach for topographic adjust-ment based on the temperature lapse rate obtained from NCEP/NCAR reanalysis.Like the monthly PREC/L precipitation data, this product is available for recent years and will be continually updated in the future.
The obtained shortwave radiation was scaled using monthly mean shortwave radiation of the SRB.During the period when SRB was unavailable, monthly mean of shortwave radiation was scaled using the ratio of the monthly climatologies of SRB and H08 obtained as means from 1984 to 2004.The daily shortwave radiation was then disaggregated into 3-hourly values based on the ratio of the 3-hour average to the daily average decided by the solar angle.

Specific humidity and downward longwave radiation
Daily specific humidity and downward longwave radiation were calculated as a function of daily precipitation, maximum and minimum temperature, and shortwave radiation using an empirical equation model.The original model by Hirabayashi et al. (2005) estimated dew point temperature by iteratively calculating sets of empirical equations until convergence was achieved.The original model, however, show unrealistic values when annual precipitation of the grid is very small.The coefficient to obtain dew point temperature in our model was therefore obtained from 1986 1995 atmospheric forcing data (Dirmeyer et al. 2006).

ASSESSMENT OF DAILY PRECIPITATION STATISTICS
The mean annual precipitation without gauge undercatch correction from 1986 to 1995 in H08 shows similar spatial distribution of those in CRU and Global Precipitation Climatology Center (GPCP; Fuchs et al. 2007) data (Supplement 2).Differences of mean annual precipitation between products are large over low latitudes, where the available number of gauge is limited.
The spatial distribution of daily precipitation in Hirabayashi et al. (2005) shows unrealistic patterns, since the occurrence and order of the intensity of daily precipitation were randomly obtained at each grid.The snapshot of the daily precipitation of H08 shows better spatial distribution than those of Hirabayashi et al. (2005), due to the improved method on the spatial distribution of daily precipitation (Supplement 3).
The reanalysis-based daily precipitation of ERA40 (Betts and Beljaars, 2003) and the NCC (Ngo-Duc et al. 2005) were compared with daily precipitation of H08, because these data sets are frequently applied as atmospheric forcings for LSMs.The GTS precipitation product and the satellite-observed daily precipitation data from the Global Precipitation Climatology Project One-Degree Daily Precipitation Data Set (GPCP-1DD; Huffman et al. 2001) were also used for the comparison, even though the available periods are limited in recent years.The comparisons of daily precipitation focused on the number of precipitation days and number of day with more than 20 mm/day, because existing atmospheric forcing data have commonly been scaled with monthly observations.Figure 1 compares the zonal means of the number of precipitation days that showed any precipitation (> 0.5 mm/day) and of the number of heavy precipitation days (> 20 mm/day) for January and July.All values are 10year means from 1986 to 1995, except for the GPCP-1DD product, which shows the 10-year mean from 1997 to 2006.Since the 1997 2006 means of GTS and H08 are similar to those of 1985 1996 (not shown), differences of GPCP-1DD from other data sets due to the means of different periods are expected to be small.The reanalysis products (ERA40 and NCC) showed larger total precipitation days than other products.The difference in the zonal mean of precipitation days reached more than 4 days over northern latitude regions.The heavy precipitation in H08 was close to that of the GPCP-1DD, with both overestimating heavy precipitation days.In contrast as compared to GTS, fewer heavy precipitation days were shown by the reanalysis data (ERA40 and NCC).
Figure 2 presents the spatial distribution of the number of precipitation days in July.July values are shown because the differences between GTS and H08 of Asia (where the two regional data sets were replaced) are large.The relatively flat variations in GPCP-1DD in the longitudinal direction in the high latitudes arises from the coverage (40°S 40°N) of the satellite data used in the product (Huffman et al. 2001).
The number of precipitation days in GPCP-1DD is lower over regions such as India and the Indochina peninsula, indicating that the GPCP-1DD product may reflect difficulties in detecting cumulous small-scale and short-time precipitation events (e.g., squalls) from satellite images.ERA40 and NCC show many more precipitation days per month than H08, GTS and GPCP-1DD.Both ERA40 and NCC indicates more than 25 days of precipitation per month over many low-latitude regions, while other data sets show 10 20 precipitation days.
Figure 3 is the same as Figure 2, but for the number of heavy precipitation days.GPCP-1DD overestimated heavy precipitation days when the monthly precipitation was high.This indicates that GPCP-1DD tends to show high precipitation intensity when the cloud information obtained from the satellite image is dense.
Heavy precipitation in H08 was also higher than that of GTS, especially over the eastern United States, northern South America, and western to eastern Eurasia.This result can be attributed to the method of estimating the parameters of the gamma distribution.If the monthly total precipitation was larger, H08 tended to show greater precipitation intensities.Since the GTS data were based on measurements by fewer gauges than the PREC/L data (and H08), it is difficult to assert that the precipitation data and number of heavy precipitation days in GTS are always more realistic than those of H08 or GPCP-1DD, especially over regions with a small number of gauges.Over northern mid-to high latitudes, both ERA40 and NCC showed many fewer heavy precipitation days compared to other data sets.
The unrealistic number of total precipitation days and heavy precipitation days in reanalysis data sets is an inevitable feature of the parameterization of precipitation process of large-scale atmospheric general circulation models (AGCMs) used to create the reanalysis data sets.The daily precipitation product presented in this paper (H08) therefore has an advantage compared to data sets based on reanalysis products because H08 includes the observed number of precipitation days.

SUMMARY
Daily precipitation, snowfall and specific humidity, and 3-hourly temperature, shortwave radiation and longwave radiation data were developed for 59-years (1948 2006) with 0.5°resolution in a consistent manner; these data were created using parameters obtained from daily observations that are available in recent years.One of the advantages of this data set is that the statistical characteristics of the created variables are independent from those of reanalysis data.Other advantages are the availability of data for recent years and the expectation of future extensions.
Global observed daily precipitation products such as GTS and GPCP-1DD are only available in recent years.Although reanalysis-based products are available for last several decades, daily precipitation products based on reanalysis have defects on the number of precipita--39 -   tion days and number of heavy precipitation days.The daily precipitation developed in this paper provides long-term period as reanalysis-based products, but produces reasonable numbers of precipitation days and heavy precipitation days.Precipitation in H08 has advantage in high latitude comparing to the GPCP-1DD, where the values are uncertain due to the limitation of satellite used to create the GPCP-1DD.Because the number of gauges registered in H08 is larger than that in GTS, and because local observation in India and East Asia are included, values of H08 is expected to be better than that of GTS.Thus, a LSM simulation driven by the newly developed daily precipitation is expected to produce more reasonable long-term land surface hydrological components than those using former data sets.

Figure 1 .
Figure 1.Zonal means of number of precipitation days (> 0.5 mm/day) (top) and of number of heavy precipitation days (> 20 mm/day) (bottom) for January (left) and July (right).All values are means from 1986 to 1995.

Figure 2 .
Figure 2. 1986 1995 mean number of precipitation days (days with precipitation > 0.5 mm/day) in July.

Figure 3 .
Figure 3. 1986 1995 mean number of heavy precipitation days (days with precipitation > 20 mm/day) in July.

Table I .
lists the data sets used to create H08.A schematic diagram of the process is shown in Supplement 1. Data sets used.