Bug or Not? Bug Report Classification Using N-Gram IDF [article]

Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, Kenichi Matsumoto
2017 arXiv   pre-print
Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to
more » ... bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.
arXiv:1709.05763v1 fatcat:elxholvf2vg43ckqvfg72bfvmq