Modeling Software Defects as Anomalies: A Case Study on Promise Repository
Journal of Software
Software defect prediction is a highly studied domain in Software Engineering research due to its importance in software development. In literature, various classification methods with static code attributes have been used to predict defects. However, defected instances are very few compared to non-defected instances and as such lead to imbalanced data. Traditional machine learning techniques give poor results for such data. In this paper an anomaly detection technique for software defect
... ftware defect prediction, is proposed which is not affected by imbalanced data. The technique incorporates both univariate and multivariate Gaussian distribution to model non-defected software module. The defected software modules are then predicted based on their deviation from the generated model. To evaluate our approach, we implemented the algorithm and tested it on the NASA datasets from the PROMISE repository. By utilizing this approach, we observed an average balance of 63.36% and 69.06% in univariate model and multivariate model respectively. Without utilizing optimization or filter, this approach yield better result than industry standard of 60%.  and displayed that it does better than just Poisson Regression. The authors in  applied Poisson Regression model and Binomial Regression models on software defect data that is not normally distributed and concluded that over-dispersion of data can be best dealt with negative binomial distribution. In  the authors also used statistical methods to predict defects. For analyzing software data, the authors explore the usefulness of multivariate Regression, Classification and Regression Trees (CART) and Residual Analysis. If the data has minor skewness, the authors showed that multivariate Regression Analysis done much better but, Residual Analysis performed best for data that has high amount of heteroscedasticity. Artificial Neural Network along with traditional methods are used in  . The authors used Case-Based Reasoning (CBR), Stepwise Regression, Artificial Neural Networks (ANN) and Rule Induction. For continuous target function, the performance of stepwise regression were better. However, for discontinuous target functions, the performance of other machine learning techniques were better. The authors suggested CBR as the more favorable approach in terms of overall performance. In  the authors used three cost-sensitive neural networks for defect prediction. A downside of the cost sensitive learning method is the misclassification cost issue. In  , the authors evaluated different machine learning and statistical predictor models on real-time defect datasets. It was shown that 1R in combination with Instance-based Learning gives consistent predictions. The authors used Consistency based Subset Evaluation technique. They suggested that the size and complexity metrics are not adequate for accurately predicting real-time software defects. An issue with existing approaches in literature regarding defect prediction prior to 2007 was that there were no baseline experiments and no reliable datasets that researchers could use to directly compare their performances ,  . Moreover, to make accurate predictions of attributes like defects found in complex software projects a rich set of process factors is needed. A causal model which considered both quantitative and qualitative process factors was developed in  . For validation, a dataset produced from 31 finished software projects in the consumer electronics industry was presented. The dataset has been of interest to other researchers evaluating models with similar aims. An even more elaborate baseline experiment was discussed by Menzies et al.  . The authors proposed the use of NASA datasets for evaluating the performance of defect predictors and the use of static code attributes as features for the various learners. The authors also present their findings regarding the debate of which attributes among the many similar relevant attributes are best for predictors. They argued that it is not important which attributes-McCabe , Halstead  or LOC is used, but it is more important how they are used. They backed their claim by showing there is no significant improvement in result for any one attribute over the others. Moreover, the authors show that such predictors do work well and that mining static code attributes is useful for defect predictors. The authors also show that static code attributes do in fact work well in predicting defects, thus disproving that static code attributes capture very little of source code. The authors implemented their approach using Naï ve Bayes and evaluated their results to affirm their hypothesis. Machine learning and statistical methods all suffer from some problems. Class imbalance problem is one of the examples. The authors in  proposed a stratification-based resampling strategy in predicting software defect-proneness that minimizes the effect of the imbalance problem. The sampling methods, as proposed by the authors, can be divided into two groups: undersampling and oversampling. The under-sampling method removes the class examples which occur more frequently. On the other hand, the over-sampling method increases the class examples which occur rarely. These methods get the anticipated class distribution. The authors applied both sampling methods to minimize the effect of the class-imbalance problem. The authors randomly under-sampled the majority class examples while over-sampling the minority class examples with a technique called the SMOTE technique  . Results of their experiment showed an improvement of 23% in the average geometric mean classification accuracy.