Journal of Agricultural Meteorology
Online ISSN : 1881-0136
Print ISSN : 0021-8588
ISSN-L : 0021-8588

This article has now been updated. Please use the final version.

Modelling soil heterotrophic respiration within a small area in an immature deciduous forest by machine learning
Rui HUKaho SAKAGUCHITakashi HIRANOLifei SUNNaishen LIANG
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML Advance online publication

Article ID: D-24-00026

Details
Abstract

 An accurate estimation of soil heterotrophic respiration (Rh) is crucial to separate autotrophic respiration (Ra) from soil respiration (Rs) and to quantify the soil carbon balance. In this study, spatiotemporal variation in Rh within an area of 0.09 ha was modeled by machine learning (ML) with Random Forest (RF) and Gradient Boosting Machine (GBM) algorithms, based on hourly Rh data measured with five automated chambers over two growing seasons in an immature deciduous forest in Hokkaido, Japan. Using the explanatory variables of soil temperature, soil moisture (water-filled pore space (WFPS) or volumetric soil water content), soil bulk density, soil carbon/nitrogen ratio (C/N), wind speed, and litter accumulation, ML models were much superior to conventional regression models using soil temperature and/or soil moisture and a multiple linear regression model using the same variables as in the ML models. In addition, the RF model performed better than the GBM model in all variable combinations. According to the RF model, soil temperature showed the highest importance in Rh variation among the variables, followed by bulk density. The RF model is promising for the gap-filling of missing Rh data and the accurate separation of Ra from Rs.

1. Introduction

Soil respiration (Rs) consisting of belowground autotrophic respiration (Ra) and heterotrophic respiration (Rh) is the dominant component of carbon dioxide (CO2) emissions from terrestrial ecosystems (Law et al., 2002). Ra almost corresponds to root respiration, and Rh is equivalent to the microbial decomposition of soil organic matter and litter accumulation on the ground (Kuzyakov, 2006). Because the environmental responses of Ra and Rh are different (Scott-Denton et al., 2006), the accurate separation of RS is crucial for understanding ecosystem carbon dynamics. Owing to difficulty in direct Ra measurement in the field, Rh has been often measured using the trenching method to exclude roots (Subke et al., 2006; Sha et al., 2021); Ra is estimated as the difference between Rs and Rh.

Ra is a component in the short-term ecosystem carbon cycle that begins with photosynthesis, whereas Rh is the carbon loss from soil to the atmosphere resulting from the long-term ecosystem carbon balance. Therefore, it is required to elucidate the variability of Rh in terms of soil carbon dynamics under climate change and soil carbon sequestration (Aguilos et al., 2013; Comeau et al., 2018; Tang et al., 2020). Rh is influenced by various factors of microclimate such as soil temperature and soil moisture (Davidson and Janssens, 2006) and soil properties such as bulk density, pH, soil carbon, and soil nitrogen (Shi et al., 2016). On a global scale, annual Rh was estimated by machine learning with a Random Forest (RF) algorithm using global soil respiration database (SRDB, Bond-Lamberty and Tompson, 2010) and related climate data (Tang et al., 2020), in which dominant variables were temperature and precipitation in high-latitude and semiarid/tropical regions, respectively.

The spatial variability of Rh is not so small even on limited scales smaller than 0.1 ha. For example, the results of continuous observations using automated chambers showed spatial variations in Rh with coefficients of variation (CV) of 20-26% in Japanese forests (Aguilos et al., 2013; Liang et al., 2010; 2017) and 26% in a tropical rain forest (Zhao et al., 2021). However, no study was found on modeling Rh considering the small-scale variability of microclimate and soil properties. Therefore, using a large amount of hourly data measured with an automated chamber system, we modeled the spatiotemporal variation of Rh within an area of 0.09 ha in an immature deciduous forest by machine learning (ML) with RF and Gradient Boosting Machine (GBM) algorithms and compared its performance with that of conventionally used empirical models.

2. Material and methods

2.1 Site description

The study was conducted in a naturally regenerating immature open deciduous forest in Hokkaido, Japan (42°44′ N, 141°31′ E; elevation 125 m) after a severe damage caused by a typhoon in 2004. The forest was dominated by Japanese white birch (Betula platyphylla). The terrain was almost flat with a slope of less than 2°. The soil was classified as an immature volcanogenous regosol with high water permeability and poor nutrient content. A thin litter layer (1-2 cm) overlaid an approximately 10-15 cm thick organic soil layer, which was underlain by volcanic pumice stones. The mean annual air temperature and precipitation were 8.3°C and 1337 mm, respectively, in 2013-2022 at the Tomakomai observatory 14 km from the study site. Please refer to Hirano et al. (2017), Hu et al. (2023), and Sano et al. (2010) for detailed site information.

2.2 Measurement

Soil CO2 flux was continuously measured with a multichannel automated system (Liang et al., 2017) in the growing seasons (mid-April to mid-November) of 2021 and 2022. The system comprised a control box and 18 chambers, of which five chambers (0.9 m×0.9 m×0.5 m, length×width×height) were trenched for Rh measurement with four PVC boards inserting into the soil at a depth of 30 cm to prevent roots from invading into chamber area. The chambers were installed randomly within an area of 0.09 ha (30 m×30 m) in an immature forest. The other 13 chambers were not involved in this study.

Each of 18 chambers was automatically closed for 200 s, one after the other, for one hour. During chamber closure, CO2 concentration in the chamber headspace was measured every 5 s with an infrared CO2 analyzer (LI-820, LI-COR, Lincoln, Nebraska, USA), and the change rate of CO2 concentration (slope, μmol mol-1 s-1) was determined using the least-squares method. CO2 flux (F, μmol m-2 s-1) was calculated using the following mass balance equation:

where P is the air pressure (Pa), H is the chamber height, R is the gas constant (8.314 Pa m3 K-1 mol-1), and Ta is the air temperature in each chamber (°C). In each chamber, volumetric soil water content (SWC, %) at a depth of 10 cm was measured with a time-domain reflectometry (TDR) sensor (CS616, Campbell Scientific Inc., Logan, Utah, USA), and soil temperature (Ts, °C) at a depth of 5 cm and Ta (°C) at a height of 25 cm were measured with thermocouple thermometers, which were recorded every 5 s simultaneously with CO2 concentration by a datalogger (CR1000, Campbell Scientific Inc.) in the control box. Some data gaps appeared owing to power failure and sensor malfunctions.

Surface soil was separately sampled from 0-5 and 5-10 cm depths at four points in each chamber using core samplers in 2020 or 2021. The total volume (TV) of solid and liquid phases in each sample was measured with a digital actual volumenometer (DIK-1150, Daiki, Kusatsu, Japan), and then the samples were oven-dried and weighed. Bulk density (BD, g cm-3) was determined from the dry weight, and liquid volume was calculated from the weight loss of soil samples through oven-drying. Soil porosity (φ, %) was determined by subtracting solid volume (= TV - liquid volume) from the sampler capacity (100 cm3). The water-filled pore space (WFPS, %) was calculated hourly as the ratio of SWC to φ. Using the soil samples after removing gravel and plant debris, carbon (C) content (kg kg-1) and nitrogen (N) content (kg kg-1) were analyzed with a CN analyzer (FlashEA 1112, Thermo Fisher Scientific, Waltham, MA, USA). Finally, the soil properties of the two depths were averaged at each point (n=4). Litter accumulation (LA) on the ground was collected from each chamber and weighed in April and November 2022. The overall dry weight (g m-2) was estimated from the moisture content measured in a portion (< 10 g) of the sample; the remaining sample was returned to each chamber. Wind speed (WS, m s-1) was measured on a tower near the chambers (Hirano et al., 2017). Please refer to Hu et al. (2023) for further information of the measurement.

2.3 Data analysis

Two ML algorithms of RF and GBM were applied using the combinations of the explanatory variables (features) of Ts, WFPS, SWC, C/N ratio, BD, WS, and LA. RF and GBM algorithms are supervised ML algorithms based on decision tree models, which provide accurate none-linear fitting between flux and environmental factors and the quantitative interpretation of the relative importance of environmental drivers (Ludwig et al., 2022; Tavares et al., 2018). The both algorithms had been widely used in predicting future carbon emission scenarios and data gap-filling because of their reliable performance (Gomes et al., 2019; Kim et al., 2020).

The 80% of hourly data were used for training, and the remaining 20% was used for testing. Model performance was evaluated using the determination coefficient (R2) and root mean square error (RMSE). A 10-fold cross-validation method was used to avoid overfitting of ML models. To avoid model bias caused by random data selection, each ML model was run ten times using different random seeds. The model performance was determined based on the best result of the ten trials (Ludwig et al., 2022; Yuan et al., 2022). The ML modeling was conducted for four combinations of variables (Table 3).

The following conventional models were also used to estimate F (Lloyd and Taylor, 1994; Gulledge and Schimel, 2000):

where a, b, c, d and e are fitting parameters. Ts (°C) denote soil temperature. F10 is F at 10°C and E0 is an activation energy parameter, which were determined by fitting. T0 is the temperature at which F becomes zero and is kept constant at -46.02°C. In addition, linear multiple regression (MR) was applied to fit F to the variables used for ML models. Data processing was conducted using R 4.2.2 (R core team, https://www.r-project.org/). R packages “caret”, “caretEnsemble”, “randomForest”, and “gbm” were used for building models, and R package “iml” was used for feature importance analysis. The relative importance of each predictor is calculated by determining the increase in mean-square error after permuting the values of each input among all examples and sending the permuted data through the model (McCandless et al., 2022). The significance of correlation was determined by the t-test. Differences among chambers were tested by one-way ANOVA and Tukey’s HSD.

3. Results

3.1 Variations in soil heterotrophic respiration and soil variables

Ts showed a clear seasonal variation with a peak in late July (Fig. 1a). In contrast, seasonality was not so clear in WFPS, which was significantly correlated to SWC (R=0.787, P < 0.0001), but WFPS decreased in July, especially in 2021 (Fig. 1b). Almost following Ts seasonality, Rh peaked in late July in the both years (Fig. 1c).

All explanatory variables and Rh were significantly different among chambers (P < 0.05) within an area of 0.09 ha (Table 1). LA had the largest CV of 56.2%, followed by SWC and its derivative WFPS. The CV of Rh was 24.6%.

Fig. 1. Mean daily soil temperature (a), WFPS (b), and heterotrophic respiration (c) in the growing seasons of two years. Shaded areas denote standard deviation of five chambers.

Table 1. Explanatory variables and heterotrophic respiration (Rh) in each chamber (mean ± 1 standard deviation (SD)).

3.2 Performance of machine learning models

Model performance was compared year by year (Tables 2 and 3). ML models and conventional models except MR showed higher performance using 2021 dataset than using the datasets of 2022 and two years. The R2 and RMSE of conventional models of Exp, LT, and GS ranged from 0.501 to 0.571 and from 0.809 to 2.290 μmol m-2 s-1, respectively (Table 2), suggesting poor fitting and inaccurate estimation abilities. In contrast, models based on more explanatory variables showed higher performance with R2 of 0.645-0.729 for MR, 0.882-0.955 for RF, and 0.828-0.901 for GBM, and RMSE of 0.606-0.701 for MR, 0.293-0.489 for RF, and 0.443-0.621 μmol m-2 s-1 for GBM. Among four variable combinations (Table 3), Combination 1, using WFPS with WS and LA, showed the highest performance both for RF and GBM. When WS and LA were excluded, the model performance decreased. The comparison between Combinations 2 and 3 showed that WFPS resulted in higher R2 and smaller RMSE than SWC did, especially for RMSE from RF. In addition, the RF model showed higher R2 and lower RMSE than those of the GBM model for all combinations, indicating that RF is superior to GBM for predicting Rh in this site.

Table 4 shows the performance of the RF model based on the two-year dataset (n=35374) of BD, C/N, Ts, SWC, and LA (Combination 3) for each chamber. Compared to the overall R2 of 0.922 and RMSE of 0.412 μmol m-2 s-1 (Table 3), model performance for individual chambers except Chamber 1 was higher with R2 of 0.926-0.939 and RMSE of 0.275-0.351 μmol m-2 s-1. Even for Chamber 1, the performance was not so bad. In addition, estimated flux reflected spatial variation in measured flux well (Table 4).

Table 2. Performance of exponential (Exp), Lloyd and Taylor (LT), Gulledge and Schimel (GS), and linear multiple regression (MR) models based on the datasets of 2021, 2022, and two years (2021 and 2022). The data size was 16755, 18619, and 35374 for 2021, 2022, and two years, respectively.

Table 3. Performance of ML models (RF and GBM) based on different variable combinations for the datasets of 2021, 2022, and two years (2021 and 2022). The data size was 16755, 18619, and 35374 for 2021, 2022, and two years, respectively. The best result from ten trials using different seeds is shown.

Table 4. Performance of the RF model based on the variable combination 3 (Table 3) using two-year data for each chamber. The best result from ten trials using different seeds is shown.

3.3 Controlling factors of soil heterotrophic respiration

The results of the RF model analysis using tow-year data (Combination 3 and Table 3) showed that the effect of Ts is notably strong (38.7%) on Rh variation (Fig. 2), indicating that Rh was mainly controlled by Ts in this site. The second important variable was BD (20.6%), followed by LA (15.1%) and SWC (14.2%). C/N had a relatively low importance (11.4%). Ts and LA had positive effect on Rh, whereas the other variables had both positive and negative effect.

Fig. 2. Feature importance of variables in the RF model based on the variable combination 3 (Table 3) using two-year data. Relative importance (total to 100 %) was used to represent the importance of each feature. Mean ± 1 standard deviation (horizontal bar) of ten replicates with different seeds is shown. The symbols of +, -, and ± represent positive, negative, and both effects on heterotrophic respiration.

4. Discussion

Spatial variability of Rh was relatively high (CV=24.6%) even within a small area of 0.09 ha, which is comparable to previous studies in Japanese forests (Aguilos et al., 2013; Liang et al., 2010; 2017), and all explanatory variables showed significant spatial variability (Table 1). Simple conventional models using one (Ts) or two (Ts and SWC) microclimate variables performed worse than MR and ML models using the variables of soil properties (Tables 2 and 3), because explanatory variables of BD, C/N, and LA brought more information on spatial heterogeneity into models. In addition, none-linear ML algorithms had better performance than the linear MR model, because relationships between Rh and environmental variables are determined by complex and comprehensive interactions rather than simple linear isolated effect (Tavares et al., 2018). The positive/negative effects of BD, SWC, and C/N in Fig. 2 indicate the limitations of the liner model. The performance of all models except MR was highest using the dataset of 2021 (n=16755) than using those of 2022 (n=18619) and two years (n=35374) despite the smaller number of available data (Tables 2 and 3), possibly because temporal variation in Ts was larger (Fig. 1) and the exponential relationship between Rh and Ts was better in 2021 (Table 2). This fact indicates that ML models can perform well with a single year of data.

The RF algorithm performed better than the GBM algorithm for all variable combinations (Table 3). RF and GBM algorithms belong to the decision tree family, but calculating strategies, resampling methods and uncertainties dealing methods during decision-tree building process are different. Therefore, the difference of model performance would have resulted from such differences in algorithms. Similar results of better performance in RF modeling were reported in previous studies (Abbasian et al., 2022; Nawar and Mouazen, 2017; Xu et al., 2018).

As an explanatory variable, WS was added, because WS is known to enhance vertical CO2 transport in porous soils, including litter accumulation, through atmospheric pressure fluctuations (Brændholt et al., 2017; Ishikura et al., 2018). WFPS was also added, because it has been used to normalize the effect of soil moisture on soil microbial activity (Scott et al., 1996; Franzlubbers, 1999). In addition to BD and C/N, which are common soil physicochemical properties, LA was added to explain spatial variation. However, WS is usually unavailable at chamber sites, and directly measurable SWC is easier to use as a soil moisture variable than WFPS. Therefore, the performance of ML modeling was compared among eclectic variable combinations (Table 3). As a result, the RF model based on the six variables of Ts, WFPS, BD, C/N, WS, and LA (Combination 1) was the best performing. With the exception of WS and by changing WFPS to SWC, the performance decreased slightly. Even using relatively general variables of Ts, SWC, BD, C/N, and LA (Combination 3), however, the RF model still performed well. Furthermore, the single RF model created using all chamber data accurately reproduced Rh for each chamber (Table 4), indicating a high ability to predict spatial variability using the property variables of BD, C/N, and LA. The model performance for Chamber 1 was somewhat lower, probably because the number of available data used to train the model was 10-20% smaller than those of other chambers (Table 4); the smaller data was caused by some sensor malfunctions.

Ts was definitely more important among the explanatory variables for Rh (Fig. 2) as in other sites (Li et al., 2017; Wei et al., 2010). Increase in Ts promotes the decomposition of soil organic matter and increase Rh (Hartley and Ineson, 2008). Despite the high spatial variability (Table 1), the importance of SWC was relatively low at 14.2%. The weak influence of SWC was also reported in a previous study conducted in this site before the disturbance (Liang et al., 2010). High SWC potentially decreases Rh through the low gas diffusivity in the soil, whereas Rh will decrease at SWC lower than a certain level because of the restriction of microbial activities (Moyano et al., 2013). In this site, SWC would have remained almost in the optimal range during the growing season because of the relatively high precipitation and high soil water permeability (Liang et al., 2010).

Spatiotemporal variability of Rh was predicted within a small area at hourly intervals by ML using continuous microclimate data of Ts and SWC, and soil properties of BD and C/N, and LA. We found that the RF algorithm performed better than the GBM algorithm for all variable combinations. The addition of variables, such as seasonal variation in litter accumulation (Kim et al., 2005), can be expected to further improve accuracy. The RF model is promising for the gap-filling of missing Rh data and the accurate separation of Ra from Rs.

Acknowledgements

This study was supported by the Environment Research and Technology Development Fund (JPMEERF20202006) of the Environmental Restoration and Conservation Agency of Japan and JPSP KAKENHI (22H05706). We thank R. Cui for data collection in the field; the Hokkaido Regional Office of the Forestry Agency for allowing us to use the study site; N. Saigusa, Y. Takahashi, R. Hirata, and the staff of CGER for managing the site; and Y. Toma for allowing us to use an analyzer for soil physical properties.

References
 
© Author (s).

This article is licensed under a Creative Commons [Attribution 4.0 International] license.
https://creativecommons.org/licenses/by/4.0/
feedback
Top