Empirical Analysis of Software Effort Preprocessing Techniques Based on Machine Learning

2021 International Journal of Intelligent Engineering and Systems  
In the subject of software effort estimation, missing data is a significant issue that leads to information loss and bias in data analysis. The majority of data preprocessing procedures are simple reuse approaches built for numerical data, which presents a problem when missing data and irrelevant features are associated with categorical variables. The purpose of this paper is to evaluate and compare the performance of the proposed technique with the knearest neighbor imputation (kNNI)
more » ... random forest imputation, and multiple imputation by chained equations in terms of error and accuracy in the ISBSG dataset when missing data is present. This study relied on five machine learning approaches as its foundation with hyperparameter tuning using grid search. The results show that the three imputation methods have almost the same performance. However, the combination of kNNI, Genetic feature selection, and classification and regression tree (CART) yielded better results than other combinations of methods with MAE (0.015), RMSE (0.037) and R-squared (0.804) values.
doi:10.22266/ijies2021.1231.49 fatcat:w4wasj2mzvgyva3y2elyyo3g2a