Automatically mining software-based, semantically-similar words from comment-code mappings

Matthew J. Howard, Samir Gupta, Lori Pollock, K. Vijay-Shanker
2013 2013 10th Working Conference on Mining Software Repositories (MSR)  
Many software development and maintenance tools involve matching between natural language words in different software artifacts (e.g., traceability) or between queries submitted by a user and software artifacts (e.g., code search). Because different people likely created the queries and various artifacts, the effectiveness of these tools is often improved by expanding queries and adding related words to textual artifact representations. Synonyms are particularly useful to overcome the mismatch
more » ... n vocabularies, as well as other word relations that indicate semantic similarity. However, experience shows that many words are semantically similar in computer science situations, but not in typical natural language documents. In this paper, we present an automatic technique to mine semantically similar words, particularly in the software context. We leverage the role of leading comments for methods and programmer conventions in writing them. Our evaluation of our mined related comment-code word mappings that do not already occur in WordNet are indeed viewed as computer science, semantically-similar word pairs in high proportions.
doi:10.1109/msr.2013.6624052 dblp:conf/msr/HowardGPV13 fatcat:jrlfjahxabektctmynl42gxj2a