Detecting table clones and smells in spreadsheets

Wensheng Dou, Shing-Chi Cheung, Chushu Gao, Chang Xu, Liang Xu, Jun Wei
2016 Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016  
Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting. End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells. In this paper, we propose
more » ... heck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones. Inspired by existing fingerprint-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones. We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus. Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones. TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect. CCS Concepts •Applied computing → Spreadsheets •Software and its engi-neering→Software testing and debugging •Social and professional topics→Software maintenance.
doi:10.1145/2950290.2950359 dblp:conf/sigsoft/DouCGXXW16 fatcat:mslz3yprvjajresivsj5wv7qpa