DebCheck: Efficient Checking for Open Source Code Clones in Software Systems

James R. Cordy, Chanchal K. Roy
2011 2011 IEEE 19th International Conference on Program Comprehension  
The problem of finding code cloned from open source code in software systems is of interest both to the open source community (e.g., for GPL and other open source license enforcement) and the industrial community (e.g., to prevent GPL "contamination" of proprietary commercial software systems). The largest collection of open source software in general distribution is the collection of eight DVDs in the Debian source distribution, and checking for cross-cloning with the Debian source
more » ... source distribution goes a long way towards finding any possible copying from the set of all open source code in the world. The NiCad clone detector is an open source languagesensitive robust clone detector that has been shown to yield both high precision and high recall in detecting syntactically meaningful near-miss clones such as functions and blocks. Given a directory of new source code to check, DebCheck uses NiCad in its incremental mode to efficiently check the system for near-miss clones of C functions in the entire Debian source base in a few minutes on a 2 Gb home computer. The same technique can be used to check systems for cross-clones with any large source collection.
doi:10.1109/icpc.2011.27 dblp:conf/iwpc/CordyR11 fatcat:qrrhfqqarzg7rjlbzj43po4tpu