Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy
Hydrology and Earth System Sciences
Abstract. Data-driven flow-forecasting models, such as artificial neural networks (ANNs), are increasingly featured in research for their potential use in operational riverine flood warning systems. However, the distributions of observed flow data are imbalanced, resulting in poor prediction accuracy on high flows in terms of both amplitude and timing error. Resampling and ensemble techniques have been shown to improve model performance on imbalanced datasets. However, the efficacy of these
... ficacy of these methods (individually or combined) has not been explicitly evaluated for improving high-flow forecasts. In this research, we systematically evaluate and compare three resampling methods, random undersampling (RUS), random oversampling (ROS), and the synthetic minority oversampling technique for regression (SMOTER), and four ensemble techniques, randomised weights and biases, bagging, adaptive boosting (AdaBoost), and least-squares boosting (LSBoost), on their ability to improve high stage prediction accuracy using ANNs. These methods are implemented both independently and in combined hybrid techniques, where the resampling methods are embedded within the ensemble methods. This systematic approach for embedding resampling methods is a novel contribution. This research presents the first analysis of the effects of combining these methods on high stage prediction accuracy. Data from two Canadian watersheds (the Bow River in Alberta and the Don River in Ontario), representing distinct hydrological systems, are used as the basis for the comparison of the methods. The models are evaluated on overall performance and on typical and high stage subsets. The results of this research indicate that resampling produces marginal improvements to high stage prediction accuracy, whereas ensemble methods produce more substantial improvements, with or without resampling. Many of the techniques used produced an asymmetric trade-off between typical and high stage performance; reduction of high stage error resulted in disproportionately larger error on a typical stage. The methods proposed in this study highlight the diversity-in-learning concept and help support future studies on adapting ensemble algorithms for resampling. This research contains many of the first instances of such methods for flow forecasting and, moreover, their efficacy in addressing the imbalance problem and heteroscedasticity, which are commonly observed in high-flow and flood-forecasting models.