GIS-理論と応用
Online ISSN : 2185-5633
Print ISSN : 1340-5381
ISSN-L : 1340-5381
原著論文
ガンジス川・ブラマプトラ川・メグナ川流域の洪水リスク分析
グーグルアースエンジンとマルチソース データ統合を使用した地理空間アプローチ
Isurun Upeksha GamageBudi Brilian SuryaFatwa Ramdani
著者情報
ジャーナル フリー HTML

2025 年 33 巻 2 号 p. 59-71

詳細
Translated Abstract

The Ganges Brahmaputra Meghna (GBM) Basin, supporting millions and vital agricultural areas, faces increasing flood risks due to climate change. This study uses Google Earth Engine to assess flood susceptibility by integrating elevation, vegetation and water indices, and hydrological networks. Results highlight low-lying regions as highly vulnerable, threatening populated and agricultural areas. Our model achieved 76% accuracy, with a precision 0.72, recall 0.84, and F1 score 0.78. Covering nearly 335,000 km2, the analysis was fast and scalable. These findings provide actionable insights for policymakers and disaster management, demonstrating the potential of geospatial technologies to enhance flood resilience in similarly at-risk regions worldwide.

1. Introduction

Flooding is a pervasive hazard with severe impacts worldwide, particularly in deltas and monsoon-affected regions such as the Ganges-Brahmaputra-Meghna (GBM) basin of Bangladesh (A. Ghosh and Dey, 2021; Khan et al., 2024). Urban expansion, climate change, and land-use change have increased both the frequency and severity of floods, underscoring the need for robust, spatially explicit susceptibility assessment (Ben Halima et al., 2025; Lee et al., 2018). Here, flood susceptibility refers to the likelihood of flooding in different locations, based on environmental factors, without considering social or economic impacts. Traditional mapping methods rely on hydrological models and in-situ observations but are constrained by data scarcity, calibration complexity, and poor scalability, especially in dynamic or data-poor systems (Samanta et al., 2018; Valsangkar et al., 2024). Remote sensing, particularly Synthetic Aperture Radar (SAR), has emerged as a reliable, all-weather alternative to optical data (Dhanabalan et al., 2021; Khan et al., 2024).

Recent advances highlight the operational value of SAR, particularly Sentinel-1, for mapping flood extents under cloud cover and dense vegetation (Dhanabalan et al., 2021; Khan et al., 2024). Combining SAR with optical and environmental datasets further improves classification accuracy and temporal coverage (Boschetti et al., 2014; Dong et al., 2016; Fatchurrachman et al., 2022). However, many studies remain fragmented, focusing on single flood events, narrow spatial extents, or single sensors without ensuring transferability across different contexts. Machine learning models such as Random Forests, Decision Trees, Logistic Regression, and Fuzzy Logic, along with hybrid approaches, often outperform traditional methods (A. Ghosh and Dey, 2021; S. Ghosh et al., 2022; Khosravi et al., 2018; Lee et al., 2018; Sahana and Patel, 2019). Multi-criteria decision approaches, including AHP and MCDS, are also widely adopted (Ben Halima et al., 2025; Samanta et al., 2018).

Despite these advances, key gaps persist: most studies fail to integrate multi-source, multi-temporal Earth Observation data, lack operational cloud-based frameworks for real-time, large-scale application, or do not include robust validation against observed flood records (A. Ghosh and Dey, 2021; Sahana and Patel, 2019). Many focus narrowly on single drivers (e.g., rainfall or topography) rather than integrating land use, vegetation, and water history. Thus, there is a critical need for fully integrated, operational, and scalable flood susceptibility models tailored to complex, data-scarce, or rapidly urbanizing basins such as the GBM (Ben Halima et al., 2025; Khosravi et al., 2018). This study addresses that gap by developing and validating an integrated model using Google Earth Engine, multi-source data fusion, and advanced algorithms, offering a transferable tool for disaster risk reduction in flood-vulnerable regions.

2. Study area

The Ganges-Brahmaputra-Meghna (GBM) basin is one of the world’s most flood-prone and densely populated river systems, spanning Bangladesh, northeastern India (Assam, Bihar, West Bengal), Nepal, and Bhutan, and supporting over 670 million people (Ferdous et al., 2018; Islam et al., 2010). Although Bangladesh covers only 7% of the basin, its low-lying topography, dense population, and monsoon-driven flows make it highly vulnerable, with catastrophic events such as the 1998 flood inundating 68% of the country, affecting over 30 million people, and devastating homes and crops (Islam et al., 2010; Valsangkar et al., 2024). The lower Jamuna floodplain is particularly exposed to channel shifts and riverbank erosion, causing frequent displacement and farmland loss (Ferdous et al., 2018; Valsangkar et al., 2024).

In coastal Bangladesh, nearly 50 million people face overlapping risks from fluvial floods, tidal surges, and saltwater intrusion (Roy et al., 2022). The Sundarbans mangrove delta, shared by Bangladesh and India, is frequently struck by cyclone-induced storm surges that threaten ecosystems and the four million people who rely on them (Hasnine and Nagdeve, 2025; Kundu and Mondal, 2025). Similarly, the Indian Kosi basin in Bihar, known as the “Sorrow of Bihar”, experiences repeated embankment breaches and river avulsions that displace millions and damage agriculture (Sinha et al., 2008). The Indian Sundarbans also faces daily tidal inundation and increasing salinity, driving vulnerable households toward migration or livelihood shifts (Hasnine and Nagdeve, 2025; Kundu and Mondal, 2025).

This diversity of hazards, landscapes, and social conditions makes the GBM basin an urgent focus for flood risk research. Our study covers 334,716 km2 across Bangladesh, Myanmar, and parts of India, where seasonal flooding from June to October remains severe but often under-prioritized in policymaking (Kabir et al., 2018). Leveraging HydroSHEDS, particularly its Level 6 HYBAS classification, enables multi-scale, accurate representation of sub-basin hydrodynamics, offering significant advantages over older products such as HYDRO1K (Döll et al., 2016; Lehner and Grill, 2013; Linke et al., 2019). This integrated approach provides a robust framework for developing scalable flood susceptibility assessments, critical not only for the GBM but also for vulnerable basins worldwide (Islam et al., 2010; Shoko and Dube, 2024).

3. Data and Methods

3.1  Datasets

We used multiple open-access datasets for hydrological and flood modeling. The HydroSHEDS dataset provides high-resolution global hydrographic information with hierarchical watershed delineation (HYBAS), which supports multi-scale analysis (Lehner and Grill, 2013). Level 6 HYBAS classification was selected for refined sub-basin hydrodynamics (Döll et al., 2016), and it is widely recommended over older products like HYDRO1K (Lehner and Grill, 2013; Linke et al., 2019). The Shuttle Radar Topography Mission (SRTM) DEM at 30 m resolution provides near-global elevation coverage with RMSE <16 m (Farr et al., 2007; Mukul and Mukul, 2021), outperforming or matching ASTER and Cartosat in South Asia, especially where harmonized terrain data are essential (Barman et al., 2023; Fereshtehpour et al., 2024; Kumar and Jha, 2023).

Surface water dynamics were analyzed using the JRC Global Surface Water (GSW) dataset, which maps permanent and seasonal water since 1984 at 30 m resolution (Pekel, J.F., 2021; Pekel et al., 2016). GSW has low omission error (<5%) and high reliability for identifying flood-prone areas (Fatchurrachman et al., 2022; Pekel et al., 2016), making it valuable for tracking long-term and seasonal changes (European Commission. Joint Research Centre., 2020; Boschetti et al., 2014). Permanent water bodies were defined as those with water present for ≥80% of the observation period, following international best practices highlighted by the European Commission. Joint Research Centre., (2020) ; Pekel, J.F., (2021) ; Messager et al., (2016) ; and Pekel et al., (2016). Additionally, Landsat 8 Collection 2 surface reflectance was used for NDVI and NDWI indices to map vegetation and water at 30m resolution, offering reliable calibration and global consistency superior to MODIS or Sentinel 2 for localized floodplain analysis (Pettorelli et al., 2005; Roy et al., 2022; Vermote et al., 2016; Wulder et al., 2019; Boschetti et al., 2014; Burstein et al., 2023; Fatchurrachman et al., 2022; Islam et al., 2010; Tran Vu Van Hoa et al., 2024).

Rainfall data were obtained from the Climate Hazards Group Infrared Precipitation with Station Data (CHIRPS), which blends satellite and ground observations at 0.05° resolution and has proven effective in capturing variability and flood extremes compared to TRMM and APHRODITE (Dinku et al., 2018; Funk et al., 2015; Yatagai et al., 2012; Rahman et al., 2025; Ramasubramanian et al., 2023; Shoko and Dube, 2024; Xie et al., 2025). For validation, we used the Global Flood Database (GFD), which documents over 900 flood events since the year 2000 using MODIS imagery at 250 m resolution, including extent, duration, and population exposure (Tellman et al., 2021). A summary of all datasets used in the study is provided in Table 1.

Table 1    Datasets used


3.2  Methodology for flood susceptibility using Google Earth Engine (GEE)

The Google Earth Engine (GEE) workflow for flood susceptibility mapping in this study follows a systematic, multi-step approach. This includes 1. Data acquisition: All input datasets including elevation (SRTM), surface water (GSW), precipitation (CHIRPS), and vegetation indices (Landsat 8 NDVI/NDWI) are imported and clipped to the study area using administrative boundaries. 2. Preprocessing: Individual layers are preprocessed, including masking, calculation of distance from permanent water, slope, topographic position index (TPI), and computation of NDVI/NDWI from Landsat data. Thirdly, Scoring/Classification: Each variable is reclassified into standardized ordinal risk classes (1-5) based on natural breaks or literature derived thresholds, ensuring comparability across different physical factors. Fourthly, Composite Susceptibility Index is constructed. The reclassified layers are combined using arithmetic averaging to produce the final flood susceptibility map, representing the mean risk across all variables at each pixel. As the fifth step, Zonal statistics are calculated. The resulting susceptibility map is summarized by administrative units using zonal statistics for spatial analysis. Lastly, the model is validated using a confusion matrix as given in Figure 1.

Figure 1    Confusion Matrix.

3.3  Composite flood susceptibility

Flood susceptibility in this study is assessed through a composite index that integrates topographic, hydrological, climatic, and land surface indicators. Each indicator is first normalized to a common ordinal scale using regionally and globally validated thresholds, then combined via an explicit and reproducible mathematical procedure. This approach follows best practices in flood risk mapping and multi-criteria GIS analysis (Burstein et al., 2023; Islam et al., 2010; Lymburner et al., 2024; Sharma, U. C., 2013).

All input variables, including distance to water bodies, elevation, slope, topographic position (TPI), precipitation, precipitation (CHIRPS), NDVI, and NDWI are discretized into five ordinal classes (1=very low, 5=very high flood susceptibility). The specific thresholds for each indicator are grounded in literature.

This rule-based normalization replaces conventional statistical normalization, as advocated by (Burstein et al., 2023; Lymburner et al., 2024), and enables direct comparison and integration.

The input indicators used in our study consist of distance to river (Dr), elevation (E), slope (S), topographic position index (TPI), precipitation (P), NDVI, and NDWI. Firstly, these are reclassified to a normalized ordinal score (vi) on a scale from 1 (very low susceptibility) to 5 (very high susceptibility), using breakpoints based on literature and data-driven thresholds. For indicator:

If Xi(x, y) is the raw value for indicator i at pixel (x, y), and [a2, a3], …, [a5, ∞] are the value ranges for each class, then vi(x, y) falls in the lowest class vi(x, y)=2 if in the next class, etc. vi(x, y)=5 if in the highest class.

Mathematical integration of the final flood susceptibility index (FSI) is calculated for each pixel as shown in equation 1:

  
\[ F(x,y) = \frac{1}{n}\sum_{i = 1}^{n}{V_{i}(x,y)} \tag{$\textit{Eq}$. 1}\]

where: F(x, y) is the composite susceptibility at pixel (x, y), Vi is the reclassified index score for indicator (i), (n)=7 which is the total number of indicators.

Finally, F(x, y) is reclassified into five susceptibility classes using the following intervals: Very Low: 1≤F<2, Low 2≤F<3, Moderate 3≤F<4, High 4≤F<5, and Very High F=5. No additional weighting or statistical normalization was applied after the reclassification. As all indicators were converted to a uniform scale. This process is fully transparent and reproducible.

Final reclassification and mapping is done by further classifying the FSI into five flood susceptibility zones for practical mapping:

  
\[ \text{R}(x,y) = \left\{ \begin{array}{l} 1~if~ F(x,y) \leq 1.0\\ 2~if~ 1.0 \leq F(x,y) \leq 1.5\\ 3~if~ 1.5 \leq F(x,y) \leq 2.0\\ 4~if~ 2.0 \leq F(x,y) \leq 2.5\\ 5~if~ F(x,y) > 2.0\\ \end{array} \right. \tag{$\textit{Eq}$. 2}\]

Where: R(x, y) denotes the discrete flood susceptibility zone from very low (1) to very high risk (5) as shown in equation 2.

This integration approach is directly comparable to best-practice multi-criteria and machine learning based models in the international literature (Lymburner et al., 2024; Tehrany et al., 2014), and threshold choices are regionally validated for the South Asain floodplain context (Rahman et al., 2025; Sanyal and Lu, 2005; Sharma, U. C., 2013). Overlay validation using the JRC Global Surface Water dataset confirms the appropriateness of thresholds and the integrated methodology (Pekel et al., 2016).

3.4  Threshold selection

To ensure reproducibility, each flood susceptibility indicator was reclassified on a scale from 1 (very low) to 5 (very high), using thresholds informed by both our study area and precedent in the literature. Distance from water bodies was scored 5 for<10 m, 4 for 10-30 m, and lower risk beyond 30 m, consistent with Robi et al. (2019), Jain et al. (2018), and Sinha et al. (2008). Elevation and slope were classified using SRTM data, with low-lying or flat areas receiving higher scores due to increased water accumulation, following Sanyal and Lu (2005), U. C. Sharma (2013), Dietrich et al. (1993), and Enomah et al. (2023). Topographic Position Index (TPI) was also applied, where negative or low TPI values indicate valleys or depressions prone to flooding (Gallant and Dowling, 2003; Gupta and Dixit, 2024).

Vegetation and surface moisture indices further refined flood susceptibility mapping. NDVI <0.2 indicated sparse vegetation or waterlogged areas (score 5), while NDVI >0.5 represented dense vegetation (score 1-2), supported by Islam et al. (2010), Pettorelli et al. (2005), and Burstein et al. (2023). NDWI thresholds (>0.3) identified permanent or ephemeral surface water, following Pekel et al. (2016) and Feyisa et al. (2014). Adaptive, context-sensitive thresholds for NDVI and NDWI, as recommended by Lymburner et al. (2024), improve accuracy in complex land covers such as forested wetlands. Precipitation categories, from ≥5 mm/day (score 5) to ≤2 mm/day (score 1), were adopted from studies in monsoon climates (Shoko and Dube, 2024; Tehrany et al., 2014).

Overall, the classification scheme combines distance, elevation, slope, TPI, vegetation, surface moisture, and precipitation to capture multiple dimensions of flood risk. Thresholds were locally adjusted but validated against regional and global studies, ensuring consistency and scientific rigor (Robi et al., 2019; Jain et al., 2018; Sanyal and Lu, 2005; Dietrich et al., 1993; Lymburner et al., 2024). This integrated, evidence-based approach enables precise mapping of flood-prone areas while accounting for both topographic and environmental variability.

Performance Metrics

Metrics such as precision, recall, F1-score, and accuracy will be used to evaluate how the model performs. Precision, defined as the proportion of true positive predictions among all positive predictions, is given by the formula given in Equation 3:

  
\[ \begin{split} &Precision =\\ &\frac{True~ Positives~ (TP)}{True~Positives~ (TP) + False~ Positives~(FP)} \end{split}\tag{$\textit{Eq}$. 3}\]

It measures the reliability of the model’s positive predictions, particularly important in minimizing false alarms, such as incorrectly classifying non-flood events as floods. Recall, or sensitivity, measures the model’s ability to detect all actual positive cases and is calculated as in Equation 4:

  
\[ \begin{split} &Recall =\\ &\frac{True~ Positives~ (TP)}{True~ Positives~ (TP) + False~ Negatives\ (FN)} \end{split}\tag{$\textit{Eq}$. 4}\]

This metric ensures that flood events are not missed, even if it means tolerating some false positives. To balance these two metrics, the F1-score is employed, which is the harmonic mean of precision and recall as given in Equation 5:

  
\[ F1~ score = \frac{2~x~ Precision~ x~ Recall}{Precision + Recall} \tag{$\textit{Eq}$. 5}\]

This metric is particularly useful in scenarios involving imbalanced datasets, where one class significantly outweighs the other. Furthermore, the accuracy metric is also computed, which represents the ratio of correctly classified samples to the total number of samples. This metric offers a simple yet effective means of evaluating the model’s overall performance. The accuracy can be expressed mathematically using the formula in Equation 6:

  
\[ \begin{split} &Accuracy =\\ &\frac{True~ Positives~ (TP) + True~ Negatives~ (TN)}{Total~ Observations} \end{split}\tag{$\textit{Eq}$. 6}\]

4. Results and Discussions

4.1  Flood susceptibility assessment in the study area

The flood susceptibility map is presented after flood potential scores assigned to various layers, creating the final susceptibility map in Figure 3. These levels, based on geospatial analysis using GEE, range from very low to very high risk, highlighting the importance of strategic flood susceptibility management across diverse terrains.

Furthermore, the results are analysed at the district level (Figure 2). Urban areas in Barisal, Chattogram, Dhaka, and Sylhet generally exhibit higher flood susceptibility then rural areas. Chattogram are Barisal districts are coastal and low-lying, while Dhaka is located inland in the south-central part of Bangladesh. It is situated in the deltaic region of the Ganges and Brahmaputra rivers. Dhaka experienced rapid urban expansion, and impervious surfaces increase surface runoff. Sylhet is located inland, where prone to riverine flooding and flash floods.

Figure 2    The flood susceptibility based on rural-urban location type and average flood event by district. Dots represent historical average flood events per year that was calculated from Public EM-DAT available at https://public.emdat.be/data. The figure was generated using RStudio 2024.04.2 Build 764.

Figure 3    The left map shows flood susceptibility across the study area, overlaid with randomly placed evaluation points for data testing, highlighting regions with varying flood susceptibility levels. The right map displays the GFD data used for accuracy assessment. The legend indicates flood susceptibility level across five categories. The map was produced using QGIS software, version 3.34.15.

There are pronounced local flood hotspots within these districts, and the result is consistent with the historical flood event of the Public International Disaster Database (EM-DAT). EM-DAT is maintained by Centre for Research on the Epidemiology of Disasters (CRED). It provides global data on natural and technological disasters since 1900.

Urban and rural areas of Khulna, Mymensingh, Rajshahi and Rangpur tend to have lower susceptibility compared to other districts. Rajshahi and Rangpur are located on relatively higher ground and face less exposure to coastal flooding, although they still experience severe riverine floods. Khulna and Mymensingh, however, are low-lying areas and are highly susceptible to both riverine and coastal flooding.

The interquartile range also varies across regions, suggesting greater spatial variability of flood risk within coastal districts compared to inland ones.

This dual approach analysis allows for a nuanced understanding of risk. Disaster management efforts should prioritize not only districts with high susceptibility risk but also those with high spatial variability, as these likely contain especially vulnerable communities that could be overlooked by a district level alone.

4.2  Model evaluation

To evaluate the flood prediction model, we used a validation framework comparing the model’s output with a testing dataset from GFD flood records. Both datasets were reclassified into binary categories (flooded or not flooded) using thresholds of 22 for GFD and 3 as a cutoff for the model’s predictions after performing sensitivity checks to balance detection performance and minimize false positives. In this analysis, seven factors were tested, each with two possible weight options (27=128 combinations), and each weight combination was further evaluated across nine cutoff thresholds, which produced cutoff values ranging from 1 to 5 at 0.5 intervals. This resulted in a total of 128×9=1,152 iterations to identify the combination of weights and cutoff that yielded the highest F1 score, ensuring a balanced trade-off between detection performance and false positives.

The region of interest (ROI) was defined as the intersection of the bounding box outside the GFD dataset and a manually defined polygon covering Bangladesh, ensuring consistent spatial boundaries. Both datasets were clipped to this ROI.

We generated 200 random sample points, ensuring spatial distribution and reproducibility, and extracted corresponding values for model and reference map. The confusion matrix counts were TP=84, FP=32, FN=16, and TN=68, resulting in an overall accuracy of 0.76, balanced accuracy of 0.76, precision of 0.72, recall of 0.84, specificity of 0.68, and F1 score of 0.78, indicating moderate agreement with the reference flood map of GFD. These results confirm the model’s balanced ability to correctly identify flood-prone areas while minimizing false positives and support reliability for flood risk management

Furthermore, using bootstrap resampling of the 200 validation points, the model’s overall accuracy at cutoff 3 was 0.76 with a 95% confidence interval (CI) of 0.70-0.81 (Figure 4). As the cutoff threshold increases, the model becomes more conservative. The miss rate rises and the false alarm rate decreases, while the F1 Score peaks near cutoff = 3, where both sensitivity (0.84) and specificity (0.68) are balanced. These results indicating that the classification performance is robust and the results are unlikely due to random variation.

Figure 4    Flood susceptibility model performance across cutoffs. The 95% CI of balanced accuracy is shown in the shaded area. The figure was generated using RStudio 2024.04.2 Build 764.

With sample-based validation, such as GFD Flood Records, the model shows promise for flood susceptibility assessments and disaster preparedness as a valuable tool for flood management.

However, some limitations exist: medium-resolution satellite data (SRTM DEM, Landsat 8, CHIRPS) may not capture water movements precisely; ground truth data is limited to GFD data; incorporating advanced hydrological models (e.g., SWAT, HECHMS, LISFLOOD) could improve precision; and computation timeouts in GEE due to unstable internet connections are a challenge.

Based on our findings, two practical recommendations emerge: (i) prioritizing adaptation measures in the rapidly urbanizing lowlands of the Dhaka―Chattogram corridor, and (ii) integrating community-based flood records with Earth Observation datasets to enhance local validation

5. Conclusion

Our flood susceptibility model effectively classifies flood events with high precision, F1 score, and overall accuracy. The model achieves a balanced accuracy of 0.76, precision of 0.72, and an F1 score of 0.78, demonstrating substantial agreement with observed flood events. This performance was achieved using a cutoff threshold of 3, determined through thousands of iterations of sensitivity analysis with stratified-random and spatially distributed points.

It is also effective at detecting non-flood events, showing a high recall. Its ability to minimize false positives and false negatives makes it a reliable foundation for flood prediction, particularly in the GBM Basin Region, where misclassifying non-flood events can be costly.

Urban and rural areas of Sylhet, Barisal, Dhaka, and Chattogram are pronounced as local flood hotspots in the study area. While districts of Rangpur and Rajashahi relatively have lower flood susceptibility despite have at least one flood event per year.

Future work could integrate higher-resolution data (e.g., from Planet) and machine learning algorithms to further improve flood susceptibility analysis.

Acknowledgement

The authors thank all the reviewers for their invaluable comments and suggestions. We also extend our gratitude to the Asian Development Bank, JEES-MUFG Japan, University of Tsukuba, and JSPS for the grants.

Data Sharing

The GEE and R codes used in this study can be found upon reasonable request.

References
 
© 2025 一般社団法人 地理情報システム学会
feedback
Top