Performance prediction for set similarity joins

Christiane Faleiro Sidney, Diego Sarmento Mendes, Leonardo Andrade Ribeiro, Theo Härder
2015 Proceedings of the 30th Annual ACM Symposium on Applied Computing - SAC '15  
Query performance prediction is essential for many important tasks in cloud-based database management including resource provisioning, admission control, and pricing. Recently, there has been some work on building prediction models to estimate execution time of traditional SQL queries. While suitable for typical OLTP/OLAP workloads, these existing approaches are insufficient to model performance of complex data processing activities for deep analytics such as cleaning and integration of data.
more » ... ese activities are largely based on similarity operations-radically different from regular relational operators. In this paper, we consider prediction models for set similarity joins. We exploit knowledge of optimization techniques and design details popularly found in set similarity join algorithms to identify relevant features, which are then used to construct prediction models based on statistical machine learning. An extensive experimental evaluation confirms the accuracy of our approach.
doi:10.1145/2695664.2695694 dblp:conf/sac/SidneyMRH15 fatcat:tu5v2e7lpvhihpzttqby3bs7ba