Strategy to automatically calibrate parameters of a hydrological model: a multi-step optimization scheme and its application to the Xinanjiang model

Parameter calibration is fundamental for the implementation and operation of a hydrological model. Automatic calibration techniques have been widely studied. However, even the most modern optimization schemes cannot always help us to obtain an optimal parameter set due to high dimensionality of the parameter space and complex interactions between parameters. The main purpose of this study was to test our strategy for automatic parameter calibration: lowering the dimensionality. Our modified Xinanjiang model was selected for study. It consists of 15 parameters controlling data adjustment and representing hydrological processes. Morris’ global sensitivity analysis technique was used to get better understanding about the structure of the parameter space. Parameters were found to have significantly different sensitivities at yearly, monthly and daily temporal scales. Also strong interactions between the parameters were observed at all three scales. A multi-step optimization scheme was designed and tested based on these observations. In this scheme, the 15 parameters are divided into three groups and optimized group by group at the time scale they are most sensitive to by using the SCEM-UA algorithm, a global optimization algorithm. The newly developed scheme is shown to be very efficient and robust.


INTRODUCTION
Successful implementation and operation of a hydrological model largely depends on the calibration of its parameters.However most modern hydrological models include more than ten parameters to be calibrated as they become more and more comprehensive and complicated.This makes calibration a tedious and time consuming procedure even for an expert.As a solution, automatic calibration techniques have been studied widely (e.g.Duan et al., 1992Duan et al., , 1993;;Gan and Biftu, 1996;Boyle et al., 2000;Zhijia et al., 2013).
The rapid development of computer technology makes it possible to automatically search an optimal parameter set of such models which minimizes single or multiple objective functions.Numerous efforts have been directed towards robust and efficient applications of optimization algorithms.Reference to earlier works on this topic can be found in Duan (2003).Both enumerative and gradient-based methods suffer from the drawback of being trapped in local optima around the first relative minimum reached and largely dependent on the specified starting point.Pickup (1977) showed that enumerative methods perform better than gradient-based methods, and optimization of a selected sub-group of parameters is more efficient than optimization of all parameters.In recent years, global optimization algorithms which try to avoid the limitations of the above algorithms referred to as local optimization algorithms and reach the global optimum, have become widely used (e.g.Gan and Biftu, 1996;Cooper et al., 1997;Zhang et al., 2002;Goswami and O'Connor, 2007).Duan et al. (1992Duan et al. ( , 1993) ) developed a global optimization algorithm referred to as the Shuffled Complex Evolution method (SCE-UA) to deal with the peculiarities encountered in calibration of conceptual watershed models.Based on the success of SCE-UA, Yapo et al. (1998) presented an effective and efficient methodology, the Multi-Objective COMplex evolution algorithm (MOCOM-UA), for solving the multiple-objective global optimization problem.Cooper et al. (1997) evaluated three global algorithms, the shuffled complex evolution (SCE), genetic algorithms and simulated annealing methods over a tank model, and concluded that the SCE method provided better estimates than other two.Madsen et al. (2002) also reported that the SCE method has the best overall performance among the three methods used in their study.
It is worth noting that the availability of global optimization algorithms is not helpful for poorly posed optimization problems.As Gupta et al. (2003) pointed out, the response surface may contain numerous local minimums at widely differing locations in the parameter space.Therefore, uncertainty in the calibrated parameters may still remain large.This phenomenon may be caused by improper model structure including over parameterization (Jakeman and Hornberger, 1993;Wheater et al., 1993), or lack of information to distinguish between a number of parameter sets for a given model structure (Beven and Binley, 1992).Of course, it is also possible to result from improper problem definition including the use of a single aggregate measure (such as the root mean square error, RMSE) to evaluate model performance (Gupta et al., 2003).
This paper presents a multi-step optimization scheme and its application to the Xinanjinag (XAJ) model (Zhao, 1992), a widely used rainfall runoff model.This scheme includes • a projection from optimization variable space (OV space) to parameter space (PM space); • subdivision of the OV space based on the results of Morris' global sensitivity analysis (Morris, 1991;Campolongo et al., 2007); and • SCEM-UA global optimization (Vrugt et al., 2003a) for all subdivided OV spaces.It was tested by using model generated proxy data and real hydrological data of the Misai basin (118.0°-119.0°E,29.10°-30.0°N).The results show the efficiency and robustness of the scheme.

A BRIEF DESCRIPTION OF THE XAJ MODEL AND THE TEST BASIN
The XAJ model The XAJ model is a conceptual hydrological model to simulate runoff generation and concentration within a catchment and obtain the discharge at a basin outlet.It is widely used in humid and semi-arid region of China, and around the world.The key concept is a distribution of storage capacities.Runoff generation will only occur over the area where the storage capacity is satisfied.Runoff so generated is then separated into three components: surface runoff, interflow and groundwater.For the details of the XAJ model, the reader is referred to Zhao (1992).
The modified XAJ model used in this study (Lu and Li, 2014)

Projection from OV space to PM space
In this scheme, projection from OV space to PM space is introduced.Sensitivity analysis and optimization were carried out in OV space, a multidimensional unit hypercube in which all variables, x 1 , x 2 , ..., x n take values from 0 to 1, and are generated by sensitivity analysis and optimization schemes.This makes the sensitivity analysis and optimization independent from model specific issues such as ranges of parameter values, and standardizes the computer programs.PM space contains all 15 parameters of the XAJ model, each set of the parameters is projected from a point in OV space, (x 1 , x 2 , ..., x n ) using equations in Table I.For convenience, the terms, OV and PM are used without distinction except in some special situations.
Table I shows the projection from OV space to PM space, and the ranges of parameter values.In order to maintain flexibility, the PM space is set much wider than the parameter ranges obtained from numerous model applications in China.Of course, it is desirable to narrow the PM space based on prior data analysis.In Table I, the narrow range of c g is based on our hydrograph recession analysis of observed discharge data.
Besides the merits mentioned above, this projection also makes it very easy to take into account the limits of the parameters and their relationships, for example, 0 < KI + KG < 1 and 0 < c s < c i < c g < 1.From the structure of the XAJ model, it is obvious that the KI and c i pair is interchangeable with the KG and c g pair.That means that exchanging the values for these two pairs will generate the same model output.By applying the constraint c i < c g , a potential cause of equifinality phenomenon can be eliminated.

The study basin
The Misai basin (118.0°-119.0°E,29.10°-30.0°N)shown in Figure 1 was selected as the study basin to test the scheme.It is located in the southeast of China.The drainage area of 797 km 2 is mountainous and covered by thick vegetation.The upper layer of the soil is highly permeable and the climate of this region is quite humid.The annual rainfall is about 1800 mm, and the annual pan evaporation about 800 mm.The daily mean areal rainfall, pan evaporation and discharge from 1982 to 1988 was prepared and used in the model.In the sensitivity analysis and optimization, the model outputs of the first year are excluded to avoid the effects of initial conditions.Besides the observed discharge data, proxy discharge data was also generated from observed rainfall and pan evaporation data by assigning a parameter set to the XAJ model.The proxy data makes it possible to test the scheme without the effects caused by uncertainties in discharge observation and model structure.

Global sensitivity analysis
A global sensitivity analysis method proposed by Morris (1991) was used in this study.In OV space or its sub-spaces, each variable may take one of p values uniformly discretized from 0 to 1, namely 0, 1/(p -1), ..., (p -2)/(p -1), 1.The value space, Ω becomes a k-dimensional p-level grid.For a vector x in Ω, an elementary effect of the i-th variable d i (x) is defined as ( 1, 2, , ) where y is objective function calculated from a model parameter set which is projected from x or x + e i Δ; Δ = 1/(p -1) and e i is a unit base vector of i-th direction.By randomly sampling r times in the OV space, two sensitivity measures, μ i and σ i can be calculated as follows: Here x (j) is the j-th vector sampled from Ω by using Morris' 'one-factor-at-a-time' approach (OAT).Obviously μ is not a good measure for non-monotonic objective function because some elementary effects will cancel each other.Campolongo et al. (2007) proposed another measure μ* as In this study, μ* and σ are used as sensitivity measures.The larger the μ i *, the more sensitive the i-th factor.And the larger the σ i , the stronger the effects from other factors the i-th factor accepts.A large σ i implies that other factors also have significant impacts on d i and make them differ from its mean value μ i .
For the implementation details of the Morris method, the reader is recommended to refer to Saltelli et al. (2004).
In the sensitivity analysis of the XAJ model parameters, k = 15 is the dimension of the PM and OV spaces and r is set to 4, which is the smallest number showing similar results with larger r values, and p is set to 11. (k + 1) × r vectors in Ω are generated by using OAT.Then they are projected to (k + 1) × r sets of the XAJ model parameters using the equations listed in Table I.For each parameter set, a simulation run is carried out, and Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970) is calculated using observed and simulated discharge data.
Where t indicates day, month or year when computing daily, monthly and yearly NSE which is used as objective function in sensitivity analysis at daily, monthly and yearly scale respectively; n is the number of the data; Q obs and Q sim are observed and simulated discharges respectively.
Figure 2 shows the sensitivity of variables in OV space, according to the parameters of the XAJ model in PM space.It shows that the sensitivities change with the time scale.At yearly scale, the parameters controlling data adjustment C p and C ep are most sensitive.After excluding these two, parameters controlling runoff generation become sensitive together with several parameters controlling runoff routing at monthly and daily scales.The parameters relating to runoff generation are less sensitive at all three time scales.
Based on the result of sensitivity analysis, the total 15 parameters (actually 15 variables in OV space) could be divided into three groups: • group 1: parameters for data adjustment, C p and C ep which are sensitive at yearly scale as well as other scales; • group 2: parameters controlling runoff component separation and routing, SM, EX, KI, KG, c s , c i and c g , which are sensitive at daily scale; and • group 3: parameters controlling runoff generation and evapotranspiration, IMP, B, WM (WM = WUM + WLM + WDM), WUM, WLM and C, which are sensitive at yearly scale when C p and C ep are set to constants.These sub-spaces containing variables (parameters or OV variables) in groups 1 to 3 are referred to as PM1, PM2 and PM3 or OV1, OV2 and OV3 respectively.

Global optimization
Based on above grouping, a three-step optimization scheme was designed to optimize the parameters of the XAJ model (Li and Lu, 2012): (1) optimize the parameters in PM1 at yearly scale while setting parameters in groups 2 and 3 to their commonly used values; (2) optimize the parameters in PM2 at daily scale while setting parameters in group 1 to the optimal values derived in step (1) and keeping the parameters in group 3 unchanged; then (3) optimize the parameters in PM3 at yearly scale again while setting the parameters in group 1 and group 2 to their optimal values derived in steps ( 1) and ( 2).Because the parameters are optimized at their most sensitive time scale, the values can be expected to be close to their real optimal values though they are conditional optimum.By cyclically using optimal values obtained in previous cycles, improvement can be expected.
In each step, the optimization is carried out using Shuffled Complex Evolution Metropolis algorithm (SCEM-UA), a global optimization method developed by Vrugt et al. (2003b) based on SCE-UA (Duan et al., 1993).This algorithm has two algorithm parameters to be assigned by the user.They are the number of complexes/sequences, q, and the population size, s.Vrugt et al. (2003b) suggested that in the case of complex-shaped posterior probability density distributions used as objective function in SCEM-UA, the use of larger values (s ≥ 250 and q ≥ 10) will be appropriate.In this study, the parameters were set to q = 15 and s = (2n p + 1)q, where n p is the number of parameters to be optimized.The Gelman and Rubin scale reduction factor of less than 1.2 indicates approximate convergence (for details on this factor, see Gelman and Rubin (1992) and Vrugt et al. (2003a)).The optimization process will stop when convergence criteria are satisfied.Otherwise, the process will stop when the preset upper limit of the number of objective function evaluations, namely the number of model runs, is exceeded.
For all three optimization steps, the model runs at daily scale, and the calculated discharge at the relevant time scale is passed to SCEM-UA to evaluate the objective function The negative sign is added because SCEM-UA is trying to find the maximum value.In order to avoid the effect of initial conditions, the data of the first year are not included.

Test using proxy data
The multi-step scheme was tested by using proxy discharge data generated by using a set of parameter values shown as true values in Table II.A good optimization scheme must be capable to obtain a parameter set close to that used in making the proxy data under such an ideal condition, with no error of model structure and no error of observation.Also initial values of the model statement variables are the same as those in the proxy data generation.The optimizations of all three steps converged after 17745, 63855 and 95530 model runs respectively.On the other hand, several trials to directly optimize all 15 parameters did not converge and stop when the number of the model runs reached its preset upper limit of 500000 times.The multi-step optimization scheme is shown to be more efficient and robust.The efficiency is also supported by the fact that the iterative use of this scheme did not make significant improvements.
Table II shows the result of each step and the final optimal set derived in the first scheme application.Figure 3 shows the daily proxy discharge and daily calculated discharge derived using optimal parameters.Only the selected period is displayed here.Both hydrographs were almost the same, and the NSEs also near 1.0.However, the parameter values were slightly different to their true values used for generation of proxy discharge data.This deviation may be caused partly by the interaction between C p and C ep shown by Lu and Li (2014).Considering the annual water balance, the effect of increasing C p will be partly canceled by increasing C ep .Some combinations of C p and C ep may generate very similar simulation results.This may be a possible reason for equifinality phenomenon.

Application to real data
The scheme was subsequently applied to real data from the Misai basin.In this case, both initial parameter values and initial statement values were unknown.In addition to one year spin-up, a model run by using commonly used parameters was made, and average values of the statement variables at first days of 6 years, from 1993 to 1998 were used as initial condition in order to get better estimates of the initial statement values.
The optimization process started from a set of parameter values which are commonly used in China.All three steps converged.The optimal parameters and calibration result can be seen in Table II and Figure 4.The yearly and daily NSEs are 0.989 and 0.739 respectively.From viewpoint of the practical use of the model, it is acceptable for water resource research and flood forecasting (Moriasi et al., 2007).

CONCLUSIONS
In this paper, a multi-step optimization scheme was developed for automatically calibrating parameters in the XAJ model.In this scheme, a projection from optimization variable space to parameter space is introduced to separate model specific issues from sensitivity analysis and optimization algorithms, and to take into account the relationships between parameters.Based on the results of global sensitivity, the parameter space was divided into three sub-spaces.The parameters belonging to these sub-spaces were optimized step by step at their most sensitive temporal scales.In the first step, parameters controlling input data adjustment are optimized at yearly scale; then the parameters controlling runoff separation and concentration are optimized at daily scale; finally, the parameters controlling runoff generation and evapotranspiration are optimized at yearly scale again.
In a test under ideal condition in which model generated discharge data was used, this scheme showed its capacity to find the true values of the parameters, and therefore its efficiency and robustness.Finally, the scheme was applied to real discharge data of the Misai basin.The optimization in all three steps converged.The yearly and daily NSEs showed that the result is acceptable for water resource research and flood forecasting.
Lowering the dimensionality of the optimization problem by subdividing the parameter space based on its structure revealed by global sensitivity analysis could be a very useful methodology to make the automatic calibration of hydrological models more efficient and robust.

Figure 1 .
Figure 1.Map of the Misai basin

Figure 3 .
Figure 3. Hydrograph of calculated discharge using optimal parameters and proxy discharge, 1985

Table I .
Projection from OV space to PM space

Table II .
Optimal parameter values while using proxy and real data of the Misai basin