Using machine learning for estimating the defect content after an inspection

F. Padberg, T. Ragg, R. Schoknecht
2004 IEEE Transactions on Software Engineering  
We view the problem of estimating the defect content of a document after an inspection as a machine learning problem : The goal is to learn from empirical data the relationship between certain observable features of an inspection (such as the total number of different defects detected ) and the number of defects actually contained in the document. We show that some features can carry significant non-linear information about the defect content. Therefore, we use a non-linear regression
more » ... neural networks, to solve the learning problem. To select the best among all neural networks trained on a given dataset, one usually reserves part of the dataset for later cross-validation ; in contrast, we use a technique which leaves the full dataset for training. This is an advantage when the dataset is small. We validate our approach on a known empirical inspection dataset. For that benchmark, our novel approach clearly outperforms both linear regression and the current standard methods in software engineering for estimating the defect content, such as capture-recapture. The validation also shows that our machine learning approach can be successful even when the empirical inspection dataset is small.
doi:10.1109/tse.2004.1265733 fatcat:rkzxqmt575g7plsqp63gt7a6ni