2020 Volume 76 Issue 2 Pages I_133-I_138
Risk evaluation of bridge collapse requires a regression model for river discharge that not only covers the situation during precipitation but also an ordinary situation without it. Although data-driven approaches using machine learning have been proposed to construct regression models for forecasting river discharge following precipitation, the distribution of discharge data is imbalanced. Thus, these regression models usually cannot track rare behaviors, such as a sudden increase in discharge because of heavy precipitation. This study aims to improve the performance of regression models by resampling learning data. The discharge data of Doki river targeted in this study has an imbalanced distribution because of rare precipitation. This data imbalance is alleviated using the resampling technique by oversampling synthetically minor data and undersampling major data. Regression models were constructed by learning the original data and resampled data, respectively. The performances of these models were compared to show that the alleviation of imbalanced learning data significantly improves the regression accuracy of the high-discharge region while maintaining the regression accuracy of the low-discharge region.