Analysing the importance of variables for sewer failure prediction

Guilherme Carvalho, Conceição Amado, Rita S. Brito, Sérgio T. Coelho, João P. Leitão
2018 Figshare  
When defining the variables to predict sewer failure and therefore optimise sewer systems maintenance, it is important to identify the ones that most significantly influence the quality of the predictions or to define the smallest number of variables that is sufficient to obtain accurate predictions. In this study, three different statistical variable selection algorithms are applied for the first time to identify the most important variables for sewer failure prediction: the mutual information
more » ... indicator, the out-of-bag samples concept, based on the random forest algorithm, and the stepwise search approach. The methods were applied to a real data-set that consists of the categorisation of sewer condition and associated physical characteristics. The mutual information and the stepwise search methods provided good predictions while those obtained using out-of-bag samples based on random forest were somewhat different, justified by the lack of robustness to imbalanced class distributions.
doi:10.6084/m9.figshare.6668486.v1 fatcat:s3nbxsqaqzgejfyiq2fkm5eufu