A simulation based approach for representation of rainfall uncertainty in conceptual rainfall runoff models

: One of the common contributors to the uncertainty in any rainfall runoff model is the error distribution within the rainfall inputs. The uncertain rainfall intro- duces systematic bias in the estimated parameters. We present here the application of a method, known as simulation extrapolation (SIMEX), to ascertain the extent of parameter bias. SIMEX requires a knowledge of the standard error associated with the rainfall at any given time step. With this knowledge, it generates multiple sets of rainfall with artificially inflated error variance, and then assesses whether this leads to any trend in the resulting parameters. This trend is then extrapolated back to assess the most suitable parameter value when the input is error free. The applicability of the method is investigated using a synthetic example where rainfall uncertainty is multiplicative and tempo-rally invariant. This paper ascertained the bias trend in three key storage parameters of the Sacramento Rainfall Runoff Model representing surface and subsur- face flow mechanisms respectively. This initial investi-gation confirmed the stability of SIMEX for use in hydrological model specification studies; which hints the possibility of embedding this simple method to improve runoff estimation.


INTRODUCTION
Rainfall is the prime input variable of any rainfall runoff model. The availability and quality of rainfall data varies in time and across multiple locations of most catchments. Rainfall runoff models require prior transformation of these point rainfall time series to a subcatchment scale. The transformation of point to aerial scale introduces input error to the rainfall runoff model. What is the effect of this input error on the resulting flows? Given known transformation error profiles, can we reduce the total uncertainty caused by this error on the simulated flows? This paper discusses a method to minimise the impact of rainfall error on flow simulations from any rainfall-runoff model.
Errors in input data introduce systematic bias in model parameterisation during calibration (Carroll et al., 1995;Chowdhury and Sharma, 2007). The bias in parameter estimation of rainfall runoff model due to uncertain rainfall has been studied by Kavetski et al. (2002). Later studies (Kavetski et al., 2006a;Kavetski et al., 2006b) used an orthogonal regression scheme termed as 'total least square' to mitigate the effect input error. An alternative functional estimator termed Simu-lation Extrapolation, or SIMEX in short, was introduced for use in hydro climatological studies by Chowdhury and Sharma (2007), that can mitigate parameter bias with fewer underlying assumptions. The Chowdhury and Sharma (2007) study was limited to hydrological applications in linear settings whereas this research letter investigates practical non linear extensions of the approach that include rainfall runoff modelling. More details on the rationale used are presented next.

METHODOLOGY
SIMEX, as the name implies, simulates parameter sensitivity to artificially inflated input error which in turn allows the extrapolation of parameters to an error free state. The method was originally developed in mid nineties (Cook and Stefanski, 1994;Stefanski and Cook, 1995) and more recently applied in hydrological area by Chowdhury and Sharma (2007). We briefly repeat the method here for the benefit of the readership, detail of which can be referred to publications mentioned above.
Consider an ideal case of a modelling response variable y in presence of the error free input variable x: Here, x is the true parameter of the regression y~x, G(.) is an underlying model, and x is the associated total error term. In practise we seldom know the true input variable (x) and need to set up our model using uncertain input variable, say . The input variable results the naive parameter estimate of the regression y~ instead.
The SIMEX methodology requires a known error distribution for with respect to x. Say, includes an additive Gaussian error with variance 2 and mean zero. Synthetic input data * is generated adding artificially inflated error 2 to , where the inflation factor is > 0. Say the resulting biased parameter estimate of y~ * is *. The simulation is repeated for a number of inflation factors for example ={0.5, 1.0, 1.5...}. Hence a regression relationship of *~ can be derived. The regression is extrapolated to = 1 to compute simex. For a number of generalised linear models (Carroll et al. 1995), analytical solutions dictate that simex x.

RAINFALL RUNOFF MODEL PARAMETERS
The Rainfall runoff relationship can be highly non linear and difficult to model. While an analytical proof of SIMEX in simple linear settings may be viable, a more practical option for implementing the method in a nonlinear rainfall runoff modelling study is through a detailed numerical simulation. This section evaluates the utility of SIMEX in a practical hydrological setting using the Sacramento conceptual rainfall-runoff model.

The Sacramento Model
The Sacramento Model is a water balance model widely used in Australia (Boughton, 2005). Sacramento Model is also alternatively known as National Weather Service River Forecasting System (NWS-RFS) in USA (Burnash, 1975;Burnash et al., 1973). A general specification of the model is: where Sac(.) is the Sacramento model which uses rainfall and evaporation data (It , Et) to generate flow (Qt) at time t with residual error t. The model parameter vector consists of 16 parameters. The parameters define the following five major characteristics of the conceptual modelling system: a) soil moisture storages, b) rate of outflow, c) percolation from upper to lower storages, d) direct runoff from impervious areas, and e) evapo-transpiration and deep seepage loss. The Sacramento model has five soil moisture storages. The model essentially simulates water movements between storages, loss and routing as represented in Figure 1.
We use daily rainfall and evaporation at Golspie, NSW in Australia from 1980 to 1992 as a notional true estimate of catchment rainfall and evaporation, {It , Et ; t = 1,2... (12 × 365)}, the time series being shown in Figure  2. The average rainfall per wet day in the catchment equals 8 mm/day with a standard deviation of 9 mm. The fraction of wet days in the recorded period is 26%.
We generate flow based on a set of given values for all 16 model parameters. The resulting flow sequence, illustrated in Figure 2, is the synthetic true flow series {Qt ; t = 1,2... (12 × 365)} used in the results presented below. This study monitored only three storage parameters with notional true values of {60, 150, 38} mm. We appreciate that limiting the parameter number reduces the complexity of the Sacramento model. Nevertheless this significantly expedites computation without compromising the objective of demonstrating the application of our proposed approach in a rainfall-runoff modelling setting.
The next step in this synthetic study is to generate a corrupted rainfall data set. The scope for errors in rain records during drier times is low (a dry day reading is error free). Nevertheless a storm may completely miss the rain gauge. Multiplicative error in rainfall (Kavetski et al., 2006b) or a combination of multiplicative and a small additive error (Carpenter and Georgakakos, 2004) have been assumed as appropriate in prior studies. We assumed the rain record error to be multiplicative in this study. Accordingly we artificially corrupt the rainfall series It by multiplying it by a lognormal series Ut logN(1, log 0.1 2 ) as in Equation (3). Figure 3 illus-trates the effect of error density on rainfall estimates. The corrupted series, Wt becomes the notional recorded rainfall. For simplicity we assume the evaporation estimate to be error free as evaporation has much lesser spatial and temporal variability compared to rainfall.

Wt = It Ut
(3) Three parameters are allowed to vary keeping the remaining 13 parameters constant. They are UZTW (upper zone tension water storage capacity), UZFW (upper zone free water storage capacity) and LZTW (lower zone tension water storage capacity). The relative roles of these parameters are outlined in the Sacramento Model conceptual representation in Figure 1. We now need an objective function prior to calibrating a rainfall runoff model. The design of objective function depends on the purpose behind the formulation of hydrologic models. Robust hydrologic models require multiple objective functions (Khadam and Kaluarachchi, 2004;Madsen, 2000). Our objective function to calibrate the Sacramento Model is based on flow hydrograph and flow duration curve. Equal weight has been assigned to both criteria as follows.
6 Figure 1. The Sacramento Model. The three storage parameters allowed to vary in this study are upper zone tension water (UZTW), upper zone free water (UZFW) and lower zone tension water (LZTW). The effect on lower zone free water (LZFW) and the other 12 parameters (not shown here) are not considered. Figure 2. The recorded daily rainfall, evaporation and the synthetic flow used in this study.

A SIMULATION BASED APPROACH FOR REPRESENTATION OF RAINFALL UNCERTAINTY
where, Qi, Q est i = observed and estimated flow at i th percentile, the number of percentiles being set equal to the total number of time steps (t) in the sample. Argmin() minimises the objective function.
A suitable optimisation scheme is required to efficiently solve Equation (4) and thus estimate { p ; p = 1, 2, 3}. The choice of suitable optimisation schemes to solve non-linear hydrological models is intricate (Duan et al., 1992;Kuczera, 1997;Sorooshian et al., 1993;Yapo et al., 1998). We chose the L-BFGS-B optimisation algorithm (Byrd et al., 1995;Zhu et al., 1997). The L-BFGS-B is a limited-memory quasi-Newton algorithm for solving large non-linear optimization problems subject to simple bounds on the variables. It is intended for problems in which information on the Hessian matrix is difficult to obtain, or for large (or dense) problems.
The naive estimate { naive } is attained by solving the objective function Equation (4) where the rainfall is Wt in the Sacramento model as shown: The rainfall data with error produces the following naive parameter estimates: naive {63, 151.5, 38.7} mm. The naive estimate is a SIMEX step when = 0. Note the difference between these and the true parameter values {60, 150, 38} mm. Can SIMEX help in modifying these naive estimates so as to offer a better representation of the truth?

Application of SIMEX
We generate replicates of the corrupted rainfall W*, with the addition of increasing levels of error { i 2 }, thereby resulting in an increasingly biased parameter vector p *. In this example, we use variance inflation factors {0.25, 0.5, 1.0, 1.5} with 300 random trials each. The expected value of p * is estimated as the sample mean over the 300 trials. The distributions of the 300 trials of each parameter and interdependence among three parameters { = 1*, 2*, 3*} are analysed below followed by SIMEX extrapolation.
The probability density functions of the sampled parameters (Figure 4), ascertained using kernel density estimation approaches, show the increase in variance in the parameter sampling distribution with increase in . The effect of rainfall error is more significant for the upper storage parameter (UZTW), which is expected. The lower storage (LZTW) is less sensitive to input error due to the dampening effect of the prolonged accumulation involved. The three parameters exhibit near inde-pendence with respect to each other a feature that allows us to formulate separate relationships between and the estimated parameter value for each of the parameters considered. Figure 5 illustrates the relationship between ( ~ p*) for each p. The presence of a clear trend in the relationship allows the formulation of the following regression relationship: where Gp(.) denotes generic regression equations that may be linear or non-linear depending on the relationship that exists. In this example, a linear regression relationship is found to be satisfactory. The extrapolation to Gp( = 1) gives us the SIMEX corrected estimate as shown in Figure 5 and listed (along with naive estimates and the true values) in Table I. Note that the SIMEX corrected estimates are closer to the true values compared to the naive estimates.
7 Figure 3. The relationship of true daily rainfall to naive rainfall due to logN(1, log 0.1 2 ) error.

DISCUSSION AND CONCLUSIONS
Uncertainty in input variable results in systematic bias in parameter estimates. We ascertained that SIMEX helps to mitigate the bias of the Sacramento model parameters. The study demonstrated that parameters with higher influence on direct runoff are more sensitive to rainfall error. This research letter primarily aims to present the stability of SIMEX in a non linear rainfall runoff modelling setting. However the problem definition has been simplified in many respects which need further work. The three model parameters considered were independently related to the variance inflation factor . In a full 16 parameter setting, any likely interdependence of the parameter needs to be addressed using multivariate statistics.
SIMEX needs prior knowledge of the error distribution and any associated non stationarity in the rainfall time series. One of the future works will be the specification of error in rainfall data, which requires several considerations. The instrumentation error is often non stationary due to change in accuracy and method of measurements over time, which can be obtained from the manufacturers or experimentally quantified (Barry, 1978;Jaech, 1985). The translation of radar rainfall record to ground rainfall (Chumchean and Sharma, 2006) or downscaling of GCM simulations of rainfall and other atmospheric variables to rainfall at the local catchment scale (Mehrotra and Sharma, 2006) introduces uncertainty that may be possible to quantify and specify error distributions for use in a SIMEX or similar framework. The transformation of point rainfalls to aerial rainfall involves interpolation error which can be ascertained using available statistical methods (Kaplan et al., 1997). Alternatively non parametric SIMEX may be useful when multiple replicates (or possibilities) of rainfall scenario are available (Devanarayan and Stefanski, 2002).
Due to less onerous assumptions and computational burden, we envisage application of SIMEX to be a routine practice among practitioners to improve parameter estimation during any rainfall runoff model production.