Why reinventing the wheels? An empirical study on library reuse and re-implementation
Empirical Software Engineering
Nowadays, with the rapid growth of open source software (OSS), library reuse becomes more and more popular since a large amount of third-party libraries are available to download and reuse. A deeper understanding on why developers reuse a library (i.e., replacing self-implemented code with an external library) or re-implement a library (i.e., replacing an imported external library with self-implemented code) could help researchers better understand the factors that developers are concerned with
... when reusing code. This understanding can then be used to improve existing libraries and API recommendation tools for researchers and practitioners by using the developers concerns identified in this study as design criteria. In this work, we investigated the reasons behind library reuse and re-implementation. To achieve this goal, we first crawled data from two popular sources, F-Droid and GitHub. Then, potential instances of library reuse and re-implementation were found automatically based on certain heuristics. Next, for each instance, we further manually identified whether it is valid or not. For library re-implementation, we obtained 82 instances which are distributed in 75 repositories. We then conducted two types of surveys (i.e., individual survey to corresponding developers of the validated instances and another open survey) for library reuse and re-implementation. For library reuse individual survey, we received 36 responses out of 139 contacted developers. For re-implementation individual survey, we received 13 responses out of 71 contacted developers. In addition, we received 56 responses from the open survey. Finally, we perform qualitative and quantitative analysis on the survey responses and commit logs of the validated instances. The results suggest that library reuse occurs mainly because developers were initially unaware of the library or the library had not been introduced. Re-implementation occurs mainly because the used library method is only a small part of the library, the library dependencies are too complicated, or the library method is deprecated. Finally, based on all findings obtained from analyzing the surveys and commit messages, we provided a few suggestions to improve the current library recommendation systems: tailored recommendation according to users' preferences, detection of external code that is similar to a part of the users' code (to avoid duplication or Communicated by: Maurizio Morisio Bowen Xu and Le An are contributed equally. Empirical Software Engineering re-implementation), grouping similar recommendations for developers to compare and select the one they prefer, and disrecommendation of poor-quality libraries.