Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories

Giuliano Antoniol, Vincenzo Fabio Rollo, Gabriele Venturi
2005 Proceedings of the 2005 international workshop on Mining software repositories - MSR '05  
This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and i
more » ... ge processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper.
doi:10.1145/1083142.1083156 dblp:conf/msr/AntoniolRV05 fatcat:idqwlvpinbcfrcsfrkno5b5kgy