Feature location in source code: a taxonomy and survey

Bogdan Dit, Meghan Revelle, Malcom Gethers, Denys Poshyvanyk
2011 Journal of Software: Evolution and Process  
________________________________________________________________________ Feature location is the activity of identifying an initial location in the source code that implements functionality in a software system. Many feature location techniques have been introduced that automate some or all of this process, and a comprehensive overview of this large body of work would be beneficial to researchers and practitioners. This paper presents a systematic literature survey of feature location
more » ... . Eighty-nine articles from 25 venues have been reviewed and classified within the taxonomy in order to organize and structure existing work in the field of feature location. The paper also discusses open issues and defines future directions in the field of feature location. process we were able to improve the quality of our taxonomy and attribute set as well as improve their descriptions. Analysis Following the process of classifying research papers our final step includes analysis the results, answers to the research questions as well as an outline of future directions for researchers and practitioners investigating feature location techniques. In order to complete this step we analyzed the trends in our resulting taxonomy and observed interesting co-occurrences of various attributes across feature location techniques. We also investigated characteristics that rarely apply to the set of techniques considered as well as characteristics which are currently emerging in the research literature. DIMENSIONS OF THE SURVEY The goal of this survey is to provide researchers and practitioners with a structured overview of existing research in the area of feature location. From a methodical inspection of the research literature we extracted a number of key dimensions 2 . These dimensions objectively describe different techniques and offer structure to the surveyed literature. The dimensions are as follows:  The type of analysis: What underlying analyses are used to support feature location?  The type of user input: What does a developer have to provide as an input to the feature location technique?  Data sources: What derivative artifacts have to be provided as an input for the feature location technique? 2 Some of these dimensions were discussed at the working session on Information Retrieval Approaches in Software Evolution at 22 nd IEEE International Conference on Software Maintenance (ICSM'06):  Output: What type of the results and how are they provided back to the user?  Programming language support: On which programming languages was this technique instantiated?  The evaluation of the approach: How was this feature location technique evaluated?  Systems evaluated: What are the systems that were used in the evaluation? The order in which these dimensions are presented does not imply any explicit priority or importance. Each dimension has a number of distinct attributes associated with it. For a given dimension, a feature location technique may be associated with multiple attributes. These dimensions and their attributes were derived by examining an initial set of articles of interest. They were then refined and generalized to succinctly characterize the properties that make feature location techniques unique, and can be used to evaluate and compare them. The goal of the taxonomy's dimensions and attributes it to allow researchers and practitioners to easily locate the feature location techniques that are most suited to their needs. The dimensions and their associated attributes that are used in the taxonomy of the surveyed articles are listed in Table 1 . These dimensions and attributes are discussed in the remainder of this section. The attributes are highlighted in italics. 6 B. Dit M. Revelle M. Gethers and D. Poshyvanyk
doi:10.1002/smr.567 fatcat:tz5smvpgtje6vohqitkhocd52e