Automating traceability link recovery through classification

Chris Mills
2017 Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017  
Traceability Link Recovery (TLR) is an important software engineering task in which a stakeholder establishes links between related items in two sets of software artifacts. Most existing approaches leverage Information Retrieval (IR) techniques, and formulate the TLR task as a retrieval problem, where pairs of similar artifacts are retrieved and presented to a user. These approaches still require significant human effort, as a stakeholder needs to manually inspect the list of recommendations
more » ... decide which ones are true links and which ones are false. In this work, we aim to automate TLR by re-imagining it as a binary classification problem. More specifically, our machine learning classification approach is able to automatically classify each link in the set of all potential links as either valid or invalid, therefore circumventing the substantial human effort required by existing techniques. CCS CONCEPTS • Software and its engineering → Traceability;
doi:10.1145/3106237.3121280 dblp:conf/sigsoft/Mills17 fatcat:fqop2kfnrjcy3mpnoyxlq644yu