Comparison of Statistical and Machine Learning models for pipe failure modeling in Water Distribution Networks (WDN)
Proceedings of 4th International Electronic Conference on Water Sciences
The application of statistical and Machine Learning (ML) models plays a critical role in planning and decision support processes for WDNs management. Failure models can provide valuable information for prioritizing the system rehabilitation even in data scarcity scenarios (such as developing countries). Few studies analyze the performance of more than two models and examples of case studies in developing countries are insufficient. A more comprehensive analysis of models' performance and
... formance and limitations is necessary for an adequate prediction of pipe failure. This study compares various statistical and ML models to provide useful information to practitioners for the selection of a suitable pipe failure model according to information availability and network characteristics. Three statistical models (i.e. Linear, Poisson, and Evolutionary Polynomial Regressions) were used for failure prediction in groups of pipes. The K-means clustering approach was applied to improve the performance of the statistical models. ML approaches, particularly Gradient Boosted Tree (GBT), Bayes, Support Vector Machines and Artificial Neuronal Networks (ANNs), were compared in predicting individual pipe failure rates. The proposed approach was applied to a WDN in Bogotá (Colombia). The results of the statistical models showed that the cluster-based prediction model reduces the prediction error of pipe failures. Regarding ML models, all methods but the ANNs showed acceptable performance. The GBT approach had the best performing classifier.