2025 Volume 81 Issue 4 Pages 171-183
This study aimed to improve a procedure to determine an appropriate time scale and lag time to predict coffee yield at the local scale. The Vegetation Health Index (VHI), Effective Drought Index, Standardised Precipitation Index, Standardised Precipitation Evapotranspiration Index (SPEI), and soil moisture from 2000 to 2022 in Dak Lak, Vietnam, were selected for the analysis. The yield differences between drought and wet phases and the correlation coefficients between the indices and yield were analysed to determine potential predictors for the model. Then, a stepwise multiple linear regression model with the leave-one-out cross-validation was performed to select appropriate predictors with their time scales and lag times. The results showed that VHI with a lag time from seven to nine months before harvest (VHI7-9) and SPEI at a time scale of five months with a lag time of ten months before harvest (SPEI510) had essential contributions in predicting coffee yield. Meanwhile, soil moisture had a poor contribution. Coffee yield could be predicted from three to nine months before harvest based on meteorological drought indices, VHI, and soil moisture. With a reasonably long prediction time and relatively high accuracy, the proposed prediction procedures may be applied to mitigate climate change’s impacts, aiming for sustainable development.
Coffee is a vital agricultural product that has significantly contributed to the socioeconomic development of regions in the Americas, Africa, and Asia (Pham et al., 2019). In Vietnam, there are over 700,000 hectares of coffee plantations as of 2022, producing an average of about 1.8 million tons per year in 2018-2022 (General Statistics Office of Viet Nam, 2024a). The Central Highlands region accounts for about 92% of the coffee-growing area and produces 95% of the coffee output (Pham, 2022), and Dak Lak is known as the “coffee capital” of the country, with an area of 213,000 hectares and an output of about 559,000 tons in 2022 (Dak Lak Statistical Office, 2023). Vietnam cultivates two main types of coffee: Robusta (Coffea canephora) and Arabica (Coffea arabica). Of these, Robusta accounts for 92.9% of the coffee-growing area and 97% of total production (Vietnam Coffee - Cocoa Association, 2019). Vietnam is the second-largest coffee producer and the leading producer of Robusta coffee globally (International Coffee Organization, 2023). In 2022, the coffee production industry contributed approximately USD 4.06 billion to Vietnam’s export turnover (General Statistics Office of Viet Nam, 2024b). The industry has created over half a million direct and indirect jobs, significantly contributing to the income of numerous households in coffee production areas. This has helped in the socio-economic development of the Central Highlands, Northwest, and other coffee-growing regions. Coffee export value typically represents about 15% of total agricultural export turnover, and its proportion of agricultural GDP has consistently exceeded 10%, contributing about 2-3% of the country’s GDP in recent years (Vietnam Coffee - Cocoa Association, 2019).
Coffee is sensitive to temperature, soil moisture (SM), and water availability. Therefore, fluctuations in weather and climate can impact the growth, flowering, fruiting, and ripening, ultimately affecting bean quality and yield (Pham et al., 2019; Gokavi and Kishor, 2020; Hameed et al., 2020; Kath et al., 2020, 2021, 2023; Dinh et al., 2022). Prolonged heat and water scarcity may lead to poor plant growth, increased fruit drop, reduced bean weight, higher flower wilting, and low fruit set rate. Meanwhile, prolonged heavy rainfall may cause flooding and root damage, potentially leading to plant death. It also affects the flowering and ripening stages, prolonging the time to harvest. Therefore, it is important to evaluate the impact of weather and climate fluctuations and provide early warnings to ensure coffee productivity. This is not only economically significant for the region and country but also has social importance in stabilizing the livelihoods of people, particularly in challenging living conditions such as mountainous areas. Thus, this may contribute to achieving the 2030 sustainable development goals (United Nations, 2015).
Many models have been developed to predict coffee productivity. Some recent outstanding studies include: Kouadio et al. (2018) used an artificial intelligence approach and soil fertility properties; Zanella et al. (2024) applied simple and multiple linear regression (MLR) based on spectral bands and derived vegetation indices from RapidEye imagery and crop nutritional status; Thao et al. (2022) used a statistical approach based on vegetation biophysical variables derived from SPOT-VEGETATION and PROBA-V products; de Oliveira Aparecido et al. (2022) combined agroclimatic data and machine learning; Kittichotsatsawat et al. (2022) artificial neural network and MLR techniques based on input variables of areas, productivity zones, rainfalls, relative humidity, and minimum and maximum temperature; Kouadio et al. (2021) used a simple process-based model using climate data and information from the previous coffee season. These studies indicated that advanced and state-of-the-art prediction models have shown promising results. The prediction time could range from one to several months before harvest. These studies have used different data sources for prediction, including field measurements and remote sensing. However, the use of the Vegetation Health Index (VHI) in predicting coffee yield has not been fully explored despite proven success in predicting yields of other crops like corn (Kogan et al., 2018a), wheat (Kogan et al., 2018b), and rice (Pham et al., 2022). It is also essential to further investigate the combination of the VHI and meteorological drought indices (MDIs) in predicting coffee yield. In addition, it is clear that the earlier the prediction, the more beneficial it is for farming practices and management. Therefore, it is important to determine their appropriate time scales and lag times for prediction. Furthermore, studies that predict coffee yield at local scale, such as district and provincial scales, are still limited.
Considering these issues, this study aimed to improve a procedure to determine an appropriate time scale and lag time to predict coffee yield at the local scale based on VHI and MDIs. Specifically, MDIs investigated in this study included the Standardised Precipitation Index (SPI), Standardised Precipitation Evapotranspiration Index (SPEI), and Effective Drought Index (EDI). In addition to investigating the combination of VHI and MDIs in forecasting coffee yield, we aimed to propose a method for processing yield data to improve the stability of the prediction results. The data of SPI, SPEI, EDI, VHI, SM, and coffee yield from 2000 to 2022 in five major coffee-growing districts of Dak Lak province of Vietnam were used for the experiment. The results of this study may provide an early warning procedure for coffee productivity, helping farmers, managers, and policymakers implement appropriate cultivation and management strategies. These contribute to achieving economic and social goals for sustainable development.
Dak Lak province is located in the Central Highlands of Vietnam (Fig. 1a). The province covers an area of 13,070.41 km2 and had a population of 1,918,440 people by 2022 (Dak Lak Statistical Office, 2023).
According to the Dak Lak Provincial People’s Committee Portal (People’s Committee of Dak Lak province, 2024), the province is located in the west and at the end of the Truong Son range. This large plateau is a gently sloping, wavy, fairly flat terrain interspersed with low plains along the main rivers. The average altitude is 400-800 m above sea level and tends to gradually lower from Southeast to Northwest. The climate varies between terrain types and altitude: the area below 300 m is hot all year round, the area from 400-800 m is hot and humid, and the area above 800 m is cool. The province experiences two distinct seasons: the rainy season from May to the end of October, which accounts for 90% of the annual rainfall, and the dry season from November to April, with negligible rain.
The province has a pretty dense river network, with three river systems (the Srepok, Ba, and Dong Nai river systems) evenly distributed across the territory. There are also many reservoirs and 833 streams more than 10 km long.
The province’s main soil types include Acrisols, Ferralsols, Fluvisols, Gleysols, and Luvisols. These soils are suitable for long-term and short term industrial crops, including coffee.

Fig. 1. (a) Study area; (b) The coffee yield of five districts and the whole Dak Lak province from 2000 to 2022.
This study focused on five major coffee-growing districts of Dak Lak. Their ID, name, and location are shown in Fig. 1a. The raw data used in this study included coffee yield, meteorological data, SM, normalised difference vegetation index (NDVI), and land surface temperature (LST) from 2000 to 2022. The coffee yield were taken from provincial statistical yearbooks (Fig. 1b). MDIs were calculated based on precipitation, temperature, sunshine hours, humidity, and wind speed collected from 19 meteorological stations surrounding the coffee-growing area (Fig. 1a). These data were taken in daily form and were taken 1 year before the coffee yield data (1999-2022). The 0.25-degree and 1-month SM was collected from the Global Land Data Assimilation System (https://ldas.gsfc.nasa.gov/gldas). The 500-m and 16-day NDVI and 1-km and 8-day LST were from the MOD13A1 and MOD11A2 v006 products, respectively. The two products were downloaded from the United States Geological Survey (https://earthexplorer.usgs.gov/). These data were used to calculate MDIs (i.e., SPI, SPEI, and EDI) and remote sensing indices (i.e., VHI).
2.3. MethodsThe overall workflow steps include:
- Calculating MDIs at each station at different time scales.
- Resampling the temporal resolution of MDIs, NDVI, and LST to one month and resampling the spatial resolution of LST and MDIs to match the resolution of NDVI.
- Calculating VHI from NDVI and LST.
- Calculating the average value of MDIs, VHI, and SM by administrative units.
- Standardising and appending the coffee yield of the administrative units to form a series.
- Analysing the correlation between MDIs, SM, VHI and yield.
- Implementing multiple linear regression with different combinations of predictors.
- Evaluate the models.
The details of these steps are described in subsections.
2.3.1. Calculating MDIs and VHIMDIs were calculated by station at time scales from 1 to 13 months, and remote sensing indices were calculated by grid cell. Then, these calculation results were converted to the same time resolution of one month. Monthly NDVI and LST were calculated using the maximum value composite technique to avoid the effects of bad weather conditions. Then, the average values of these indices were calculated for each district. For MDIs, the average values were calculated according to Thiessen polygons. For the remaining data, the average values were calculated based on grid cells located in the coffee-growing area of each district. Because the mentioned average calculations were not performed for meteorological variables but for meteorological indices, the calculation results were not affected by elevation differences. These calculated indices are briefly described below. Detailed calculation steps can be found in the cited references.
- SPI
SPI (McKee et al., 1993) is widely used worldwide. SPI is determined based on the distribution function of precipitation amount and is calculated at different time scales, from one to several months. The steps for calculating SPI are described in detail in the study of Viet (2024). This study used subroutine cdfgam to calculate the gamma distribution function. Details about this subroutine can be found on the website https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.html.
- SPEI
SPEI is calculated based on the difference between precipitation (P) and potential evapotranspiration (ET) (Vicente-Serrano et al., 2010). The steps for calculating SPEI are described in detail in the study of Luong and Bui (2024). In this study, the reference evapotranspiration (ETo) was calculated using the FAO Penman-Monteith method, which is considered the best method for calculating ETo (Allen et al., 1998). Details of this method were presented in Allen et al. (1998).
In this study, ETo is calculated on a monthly basis. The data used to calculate ETo are temperature, sunshine hours, humidity and wind speed at the monitoring stations. Then, ETo is calculated as follows:

where Rn is the net radiation at the crop surface (MJ m-2 day-1), G is the soil heat flux density (MJ m-2 day-1), T is the mean daily air temperature at 2 m height (°C), u2 is the wind speed at 2 m height (m s-1), es is the saturation vapour pressure (kPa), ea is actual vapour pressure (kPa), ∆ is the slope vapour pressure curve (kPa °C-1), γ is the psychrometric constant (kPa °C-1).
In equation (1), ∆, es and ea are given by:



In equation (4), H is the relative humidity (%). The net radiation (Rn) is the difference between the incoming net shortwave radiation (Rns) and the outgoing net longwave radiation (RnL).

with Rns is given by:

In equation (6), n is the actual duration of sunshine (hour), N is the maximum possible duration of sunshine or daylight hours (hour), Ra is the extraterrestrial radiation (MJ m-2 day-1), as and bs is regression constant. The default values for as and bs are 0.25 and 0.50, for the study area they are taken as 0.21 and 0.41.
In equation (6), Ra and N are calculated as follows:


with



In these equations, ψ is the geographical latitude (rad), dr is the inverse relative distance Earth-Sun, δ is the solar declination (rad), J is the julian day.
In equation (5), according to the Vietnam National Standard TCVN 9168:2012 (Ministry of Science and Technology of Vietnam, 2012), RnL is given by:

In equation (1), the the soil heat flux density (G) is given by:

where Tmonth, i is the mean air temperature of month i (°C) and Tmonth, i-1 is the mean air temperature of previous month (°C).
In equation (1), the psychrometric constant is given by

where P is the atmospheric pressure (kPa).
The wind speed at 2 m (u2) is calculated from the wind speed at 10 m (u10) as follows:

- EDI
EDI, proposed by Byun and Wilhite (1999), is a function of the deviation of effective precipitation (EP). It is calculated as follows:
(1) Calculating the daily EP: The calculation procedure of the EDI starts by applying a dummy water deficit period as a requirement for determining the real period. Generally, a dummy value of 365 days is chosen as it represents a dominant precipitation cycle globally. Once the duration is set, daily EP is calculated according to the following equation:

where i is the duration of summation, which is the number of the days whose precipitation is summed for calculation of drought severity, and Pm is the precipitation of m-1 days before.
(2) Calculating the mean value of daily EP for each julian day.
(3) Calculating the deviation of effective precipitation:

where DEP is deviation of effective precipitation and MEP is the mean value of daily EP.
(4) Calculating the standard deviation of DEP for each julian day.
(5) Calculating EDI:

where ST (DEP) is the standard deviation of DEP.
In this study, the precipitation data for EDI calculation were daily, so it was necessary to average the daily values to obtain monthly values.
- VHI
VHI is a combination of Vegetation Condition Index (VCI) and Temperature Condition Index (TCI) (Kogan, 1995, 1997, 2002). VCI is calculated from NDVI, and TCI is calculated from LST. VHI is strongly connected to crop growth, yield, and drought-related stress (Yagci et al., 2015; Feng et al., 2019). The steps for calculating VCI, TCI, and VHI are described as follows (Luong and Bui, 2024):



where the weight α was set to 0.5, NDVIi is the NDVI value of a specific pixel at time i during a particular year, NDVImax and NDVImin are the maximum and minimum NDVI values over the analysed period, respectively, LSTi is the LST value of a specific pixel at time i during a particular year, and LSTmax and LSTmin are the maximum and minimum LST values over the analysed period, respectively.
2.3.2. Calculating the values of the variables at district and provincial scalesCoffee yield data was collected at district and provincial scales from Dak Lak Statistics Office (Fig. 1b). To get the average yield of a district, a random number of coffee plantations were selected in each commune of the coffee-growing district. Therefore, the values of other variables were also transformed to the same scales. As a result, the Thiessen polygon interpolation was applied to MDIs, while the arithmetic mean of cells was used for VHI and SM.
2.3.3. Analysing the correlation between MDIs, soil moisture, VHI, and coffee yieldThe correlation coefficient was used in this step. MDIs at a time scale from 1 to 13 months, SM, and VHI were taken from 0 to 12 months before harvest time. Meanwhile, the coffee yield of six administrative units (including five districts and the province), after being transformed into a standardised variable (with a mean value of zero and a standard deviation of one), was appended to form a series. Thus, the yield series length was n = 6 units×23 years = 138 observations. This created an input series long enough to ensure the stability of the prediction results.
In addition, for the MDIs, an analysis of yield differences between drought levels was also performed. This analysis aimed to clarify the impact of drought on coffee yield and determine the appropriate time scales and lag times of MDIs for the yield prediction model. Based on the SPEI, SPI, and EDI values, it was common to divide drought conditions into seven levels: extremely wet, very wet, moderately wet, normal, moderately dry, severely dry, and extremely dry (Morid et al., 2006). However, because the data series was not too long, this analysis only categorised drought conditions into two levels: “dry” when the indices had negative values and “wet” when the indices had positive values. The time scale of the indices for this analysis was also from 1 month to 13 months. The difference in coffee yield between the wet and dry phases was calculated for each month before harvest. Coffee was harvested in October, so the drought levels from October of the previous year (t-1) to October of the harvest year (t) were included in the analysis. In addition, coffee yield trends were removed before the analysis to eliminate impacts other than drought. Let α be the slope of the coffee yield trend line, Yi be the yield of year i in n years of observation, then the modified yield value Ymi is calculated as follows:

The removal of the slope of the trend in this equation applies only to this analysis.
2.3.4. Implementing multiple linear regressionThe results of the correlation analysis in the previous step helped determine potential predictors. They were the variables that had the best correlation with coffee yield. The coffee yield prediction models were built using a stepwise MLR technique as follows:

where Ys is the predicted yield, t is a time variable, which are included to consider the trend effects, and xi is the potential predictors, which are MDIs, SM, and VHI taken in the months be for the predicted time. After each iteration, this process added or removed a potential predictor based on testing its statistical significance and prediction quality.
To analyse the contribution of remote-sensing indices and MDIs, there were several options for building model. They were divided into three options as follows:
• Group 1: t and best-correlated VHIs.
• Group 2: t and best-correlated MDIs.
• Group 3: t, best-correlated VHIs, SM, and MDIs that gave the best yield prediction results in group 2.
2.3.5. Evaluating prediction models and the contribution of predictorsThe correlation coefficient (R), the Root Mean Square Error (RMSE), and Willmott’s index of agreement (d) were used to evaluate the quality of yield prediction models. They are calculated as follows:



where n is the series length, Y is the actual yield, Ȳ is the average of Y, and YS is the yield obtained using the Leave-One-Out Cross-Validation (LOOCV) technique. This technique helps create a predicted series length equal to the actual series length and, therefore, increases reliability in the analysis.
Meanwhile, the relative importance of each predictor was determined based on the change in the adjusted R square when that predictor was leaved from the model.
The difference in coffee yield between the wet and dry phases of SPEI, SPI, and EDI is presented in Figs. 2a, b, and c, respectively. In the figures, the grey part shows no difference in yield between the dry and wet phases at p = 0.01. It can be seen that the ability to reflect the impact of drought on coffee yield increased in the order of EDI, SPI, and SPEI. With SPEI and SPI, cells below or near the red dashed line had large values. Because this line had a slope of one and the values of the indices were recorded at the end of the calculation period, the MDIs on this line all contained precipitation from August to October before the harvest year. Furthermore, the maximum values of the yield difference between the wet and dry conditions at the same time scale are summarised and shown in Fig. 2d. This result showed that the difference in coffee yield between the dry and wet phases was about 0.25-0.30 tons ha-1. The time scale from six to eight months gives a pretty high difference in coffee yield. This indicated that precipitation from August to January played an important role in forming coffee yield. In addition, SPEI and SPI at the time scale of one year showed the most significant difference. This meant that prolonged dry or wet conditions would significantly affect coffee yield.

Fig. 2. The differences in coffee yield between the wet and dry phases of (a) SPEI, (b) SPI, and (c) EDI; (d) The maximum yield difference between dry and wet conditions correspond to the time scales of these indices. Note: In Figs. (a), (b) and (c), t-1 and t represent the months of the previous year and the harvest year, respectively.
The correlation coefficients between the yield and SPEI, SPI, and EDI at time scales from 1 to 13 months are presented in Figs. 3a, b, and c, respectively. The grey colour in the figures shows that the correlation coefficient is insignificant at p = 0.01. The indices were taken from 0 to 12 months before harvest. Since coffee began to be harvested in October, the indices within previous 12 months were selected. The distribution of correlation coefficients showed that the relationship between SPEI and SPI with coffee yield was similar. SPEI and SPI at a time scale from four to six months and from December to February gave the highest correlation coefficient. Compared to SPEI and SPI, EDI had a different distribution. From April to July of the harvest year, the correlation coefficient of this index with coffee yield was relatively small. In addition, the common point of these three indices was that their correlation coefficient with coffee yield was negative in August and September of the harvest year at a time scale of less than nine months. Furthermore, the highest correlation coefficient of the indices at different time scales is summarised and presented in Fig. 3d. It can be seen that SPEI and SPEI gained a better relationship with coffee yield than EDI. Both SPEI and SPI coefficients had the same characteristic that the time scale from four to six months gave the highest correlation coefficient. Compared with Figs. 3a and 3b, these extreme points corresponded to the time from December to March. In other words, SPI4, SPI5, SPI6, SPEI4, SPEI5, and SPEI6 during this period could be considered potential predictors of coffee yield.

Fig. 3. The correlation coefficient between coffee yield and (a) SPEI, (b) SPI, and (c) EDI; (d) The highest correlation coefficient between coffee yield and the indices at different time scales. Note: In Figs. (a), (b) and (c), t-1 and t represent the months of the previous year and the harvest year, respectively.
The correlation coefficient between VHI and SM of the months before the harvest and coffee yield is presented in Fig. 4. It can be seen that SM from remote sensing data did not reflect coffee yield well. Only the period from November to March had a correlation coefficient slightly higher than the critical value (Rcr) at p = 0.01. In contrast, VHI from six to nine months before harvest had a pretty good relationship with coffee yield. It meant that VHI from January to April was potential predictors.

Fig. 4. The correlation coefficient between VHI and SM of the months before the harvest and coffee yield.
As mentioned above, the variables that had a good relationship with coffee yield included SPI4, SPI5, SPI6, SPEI4, SPEI5 and SPEI6 from December to March and VHI from January to April. They were considered potential predictors of coffee yield. Because the distribution of the correlation coefficients of SPEI and SPI was nearly similar and SPEI had a better correlation than SPI (Figs. 3a and 3b), SPEI was selected as a representative predictor for the following analyses. Thus, if these variables were used, the prediction could be done six months or more before harvest.
In addition to these potential predictors, preliminary analysis showed that coffee yield increased by about 0.029 tons ha-1 year-1 during the studied period. Therefore, a time variable t were included to consider the trend effects. To analyse the contribution of predictors, there were six options for selecting variables, which were divided into three groups as follows:
• Group 1: t and VHI.
• Group 2: t and either SPEI4, SPEI5, or SPEI6.
• Group 3: t, VHI, SM, and SPEI that gave the best yield prediction results in group 2 (SPEI*).
where:
• The value of t was the ordinal number of the harvest year since 2000.
• The value of VHI could be averaged from the ith to the jth month before the harvest month and was denoted as VHIi-j.
• The values of SPEI4, SPEI5, and SPEI6 could be taken in the kth month before the harvest month and were denoted as SPEI4k, SPEI5k and SPEI6k.
3.3.2. Prediction results using multiple linear regression- Group 1:
In this analysis, the prediction time was from one to nine months before harvest. The prediction results for three months were selected as a representative to illustrate in Table 1 and Fig. 5. In particular, Figs. 5a and 5b illustrate the dispersion of predicted results and observed data in the standardised and unstandardised forms, respectively.
The prediction quality assessment of potential predictors in group 1 is summarised in Table 2. The correlation coefficients obtained for the models are pretty strong positive, ranging from 0.708 to 0.804. When the prediction time was increased from one to seven months before harvest, the prediction quality decreased but not considerably. After this period, the prediction quality decreased rapidly. Thus, the seven-month model could be used to predict coffee yield. This model had only one predictor, VHI7-9 (i.e., the average value of VHI from the 7th to the 9th month before harvest). This variable also appeared when the prediction time was shorter. Corresponding to the harvest time in October, VHI7-9 was the VHI from January to March. This result was consistent with the results in section 3.2, which indicated that the VHI from January to April was potential predictors.
Table 1. The coefficients of the MLR of standardised yield data using predictors in group 1.
Table 2. The prediction accuracy of predictors in group 1.

Fig. 5. The dispersion of prediction result and observed yield in (a) standardised form and b) unstandardised form for 3 months before harvest.
- Group 2:
The prediction accuracy of potential predictors in group 2 is summarised in Table 3. It can be seen that the predictors participating in the models were t, SPEI411, SPEI510, and SPEI69. These SPEIs were all related to precipitation and evapotranspiration from August to November. The RMSE, R, and d values in Table 2 also revealed that using SPEI gave pretty good prediction results. This index also had the advantage of being able to predict coffee yield from 7 to 11 months in advance. A comparison between RMSE, R, and d indicated that SPEI5 gave the best prediction quality. Therefore, it may be chosen as a representative predictor for the coffee yield prediction. In addition, a comparison between Tables 2 and 3 showed that the quality of coffee yield prediction using SPEI was lower than that using VHI.
Table 3. The prediction accuracy of predictors in group 2.
- Group 3:
As mentioned above, SPEI5 gave the best prediction quality in group 2; therefore, the predictors in group 3 included t, VHI, SM, and SPEI5. The potential predictors in this group and their coefficients are summarised in Table 4. The dispersion of prediction results and observed yield for 1-3, 4-5, 7, and 9 months before harvest is illustrated in Fig. 6. The coefficients in Table 4 showed that all prediction models had a high level of stability for the predictions from one to nine months before harvest. The number of predictors selected in these models was all three. When the number of predictors decreased or increased, the prediction quality decreased. The model did not involve many predictors because some variables were cumulative over time, e.g. VHI7-9, which represented the VHI value for three months, and SPEI5, which represented the difference between precipitation and evapotranspiration for five months.
Table 4. The coefficients of the MLR of standardised yield data using predictors in group 3.

Fig. 6. The dispersion of prediction result and observed yield for (a) 1-3 months, (b) 4-5 months, (c) 7 months, and (d) 9 months before harvest.
The relative importance of predictors and the accuracy of the prediction models are summarised in Table 5. VHI was always the first predictor involved, followed by SPEI5, SM, and t. The most selected predictor was VHI7-9, followed by SPEI510; meanwhile, SM and t were rarely chosen. A comparison between Tables 1, 2, and 5 showed that the prediction quality had significantly increased when combining VHI and SPEI5. The prediction quality of one month before harvest using VHI alone was lower than the prediction quality of eight months before harvest using VHI and SPEI5.
Table 5. The prediction accuracy of predictors in group 3.
The results in Table 5 indicated that the first selected predictors in the prediction models were always VHI7-9 when the prediction time was less than seven months. This variable represents the average value of VHI in the period of 7-9 months before harvest, i.e. from January to March of the harvest year. The contribution of this variable to the prediction models was very high, from 62.2% to 87.7%. For the prediction of eight and nine months, the selected predictors were VHI8-9 and VHI9, respectively; these variables contained the VHI values of January and February. With a prediction time of six to nine months, the contribution of VHI gradually decreased.
The results also showed that SPEI510 was always the second selected predictor when the prediction time was one to seven months, and SPEI510 was replaced by SPEI511 when the prediction time was eight and nine months. The relative contribution of SPEI to the prediction quality was only nearly 10% when the prediction time was from one to six months. As the prediction time gradually increased from seven to nine months, its contribution also gradually increased from 28.9% to 42.1%.
Meanwhile, SM was rarely involved in prediction models. When it appeared, its contribution was quite low, less than 5%. In addition, t only appeared when the prediction time was from seven to nine months, and its relative contribution ranged from 9% to 32.7%.
To dig deeper into prediction accuracy, Table 6 and Fig. 7 illustrate an analysis of the prediction of five months before harvest by the administrative unit. This result showed that compared to Table 5, d increased significantly from 0.808 to 0.86-0.93. This meant that the prediction quality was considerably higher if based on the d values of each administrative unit with the yield in an unstandardised form. However, the results in Table 6 showed that prediction quality was uneven across districts. The values of d and RMSE clearly depended on the standard deviation (Stdev) of coffee yield, with their correlation coefficient values of 0.73 and 0.88, respectively. This implied that localities with less fluctuation in yield gained better prediction quality.
Table 6. The prediction accuracy for 5 months before harvest by the administrative unit in group 3.

Fig. 7. The dispersion of prediction result and observed yield for 5 months before harvest by the administrative unit.
- For VHI:
The VHI from January to April can be considered a potential predictor because it contains much information related to coffee yield. In the study area, coffee blooms from January to March and bears fruit in April. Therefore, the health of coffee trees during this period is essential and contributes to determining productivity. This is also clearly reflected in the prediction models. VHI7-9 is always the first predictor involved in the models when the prediction time is less than seven months before harvest. VHI7-9 represents the value of this index in the period of January to March. VHI7-9 reveals crop conditions and reflects soil moisture and the ability to supply water to crops during this period and the remaining months of the dry season. In addition, when the prediction time is eight and nine months, the predictors are VHI8-9 and VHI9, respectively, which contain the VHI of January and February. The contribution of these variables is smaller than that of VHI7-9. Still, together with VHI7-9, they confirm that the health status of coffee trees during flowering is the main contributing factor in yield formation.
- For MDIs:
As mentioned, analysis based on yield differences between drought levels (Fig. 2) reveals that the precipitation in August, September, and October of the year before the harvest year has the most significant impact on coffee yield differences. The average rainfall summary in the coffee-growing area shows that this period is the last three months of the rainy season and has the heaviest rainfall (Fig. 8). Because the rainfall from November to April is very low, the rainfall from August to October is the reserve source for crops next year and contributes to determining coffee yield.

Fig. 8. The monthly rainfall condition in Dak-Lak from 2000 to 2022. The box plot indicates the average rainfall distribution in each month during the period. The horizontal line within the box indicates the median. The upper and lower edges of the box represent the 75% and 25% percentile, respectively. The upper and lower ends of the whiskers represent the maximum and minimum values, respectively.
When analysing the correlation coefficient between MDIs and yield, it can be seen that the common point of these three indices is that their correlation coefficient with coffee yield is negative in August and September of the harvest year at a time scale of less than nine months. This means that when rainfall increases in the months near the harvest month, productivity decreases. August and September have the highest rainfall (Fig. 8), reaching over 400 mm month-1 and over 20 rainy days month-1. Waterlogging of trees and leaching of nutrients due to heavy rains may be the causes of reduced coffee yields. This result is relatively consistent with the findings of Kath et al. (2021). On the contrary, if rainfall increases from October of the previous year to July of the harvest year, productivity tends to increase. Rainfall from November to July is low compared to the reference evapotranspiration (Luong, 2021); therefore, crops may grow well if rainfall increases in these months. In addition, the rainfall in October and November is positively correlated to the yield of the following year because it determines soil moisture and irrigation sources for the long dry season of the next crop. In addition, because the calculation results of MDIs at all time scales are recorded in the last month of the period, the extreme points with the highest correlation coefficients of these indices are related to rainfall from September to December of the year before harvest. This confirms that the rainfall from the middle to the end of the rainy season of the previous year plays a vital role in soil moisture and irrigation water in the dry season of the following year and determines the coffee growth and productivity of the following year.
Correspondingly, in the prediction models, it can be seen that SPEI510 is always the second selected predictor when the prediction time is from one to seven months. SPEI510 is the SPEI5 taken ten months before harvest time. For the harvest in October, SPEI510 is calculated based on rainfall and evapotranspiration from August to December of the previous year. This is the period from the middle of the rainy season to the beginning of the dry season. The difference between rainfall and evapotranspiration in these months determines the amount of surface and groundwater storage in the following dry season, lasting up to 6 months in this area, including the period of bud formation, flowering and fruit set. In other words, SPEI510 is one of the crucial factors determining the development and formation of yield. It showed the role of precipitation from the mid-wet season to the early dry season on coffee yield. However, because VHI value depended on drought conditions, it already contained drought information, which made the contribution of SPEI510 not too high in the models.
4.2. Quality of the prediction modelThere have been several studies on coffee yield prediction in this or its adjacent areas in recent years. Kouadio et al. (2018) used the Extreme Learning Machine model to predict coffee yield for Lam Dong province, which is located next to Dak Lak. The soil fertility properties were used as predictors. Quality assessment results based on independent data gave RMSE = 0.496 tons ha-1. Kouadio et al. (2021) developed a robusta model to simulate coffee’s potential growth and development based on weather data (including minimum and maximum temperatures, solar radiation, and precipitation) and information from the previous growing season (including harvest date and yield). This model simulated the coffee yield in the Central Highlands provinces, and the quality assessment results gave RMSE=0.24-0.33 tons ha-1 and d ≥ 0.71. Dinh et al. (2022) used MLR to evaluate the impact of weather on coffee yield in the Central Highlands provinces, using regularization techniques such as principal component analysis and LOOCV. This study gave the prediction results from three to six months before harvest, and the prediction quality assessment showed R = 0.36-0.6 and RMSE = 0.16-0.29 tons ha-1. Compared to the three mentioned studies, our study provides higher accuracy by combining data on MDIs and VHI. In addition, compared to the study of Dinh et al. (2022), which is the study with the highest accuracy among the three, our study can predict about three months earlier.
4.3. Limitations of the studyThis study also has some limitations that have not been discussed and resolved. Firstly, it can be seen that localities with less fluctuation in yield have better prediction quality (Table 6). The fluctuation of coffee yield depends on many factors, including natural conditions and cultivation practices, and there have been no studies on these issues in this area. Secondly, the area of coffee plantations is also related to prediction accuracy. Larger areas often have higher R and associated causes have not been identified. Lastly, SM plays an important role in yield formation but makes little contribution to prediction quality. This issue may be related to the low spatial resolution and the quality of SM data. Remote sensing SM data is affected by many factors, including tree canopy cover, which can affect the accuracy of soil moisture assimilation. These limitations need to be considered in further studies.
In addition, The method and models in this study were only experimented with one central coffee-growing area in Vietnam. Further experiments are needed to strengthen and validate them with other data sources.
Analysis of the difference in coffee yield between wet and dry conditions from MDIs showed that SPEI had the most apparent difference. Coffee yield in the dry phases was about 0.25-0.30 tons ha-1 lower than in the wet phase. The analysis of yield differences based on MDIs and the yield prediction models indicated that precipitation from August to December of the previous year is an essential factor affecting the formation of coffee yield. In addition, the correlation coefficients between MDIs and yield also revealed that SPI4, SPI5, SPI6, SPEI4, SPEI5, and SPEI6 from December to March were potential predictors.
The results of the correlation analysis also indicated that VHIs from six to nine months before harvest gained a pretty good correlation coefficient with coffee yield. It implied that VHIs from January to April were potential predictors. In contrast, soil moisture in the months before harvest did not reflect well on coffee yield.
The results of building prediction models showed that coffee yield can be predicted three to nine months before harvest. When the prediction time increased, the prediction quality decreased but not considerably. The predictors often appearing in the models were VHI7-9, followed by SPEI510. With a harvest time of October, VHI7-9 was the VHI from January to March. That meant VHI during this period determined coffee yield. VHI7-9 was selected in the models with a prediction time of less than seven months and contributed over 60% to the prediction quality. In addition, because SPEI510 was calculated based on rainfall from August to December of the previous year, rainfall in these months is a vital factor in forming the yield for the harvest nearly a year later. With a prediction time of seven to nime months, the relative contribution of SPEI510 is pretty high, about 30-40%. Howerver, because VHI value depended on drought conditions, it already contained drought information, which made the contribution of SPEI510 not too high in the models.
Overall, the accuracy of prediction models in this study was higher and has improved compared to previous studies. In other words, the proposed prediction procedure was appropriate, and the selected predictors demonstrated their key role in forming coffee productivity. With a reasonably long prediction time and relatively high accuracy, the developed prediction procedure may be applied in practice at the local scale to mitigate the impacts of climate change, aiming for sustainable development. Some recommendations for using the results of this study in agricultural practices include: (1) Water use and allocation: SPEI510 can be used to assess drought conditions, thereby allocating water resources appropriately to ensure water for coffee. In addition, it is necessary to build new reservoirs and use water-saving irrigation technologies to secure water resources for the dry season; (2) Monitoring: Regularly monitor the plant health in January-March (can be based on the VHI7-9) to adjust the amount of water and nutrients for the plants.
We thank the editor and reviewers for their valuable suggestions and comments.