Join size estimation subject to filter conditions

David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, Sunil P. Chakkappen
2015 Proceedings of the VLDB Endowment  
In this paper, we present a new algorithm for estimating the size of equality join of multiple database tables. The proposed algorithm, Correlated Sampling, constructs a small space synopsis for each table, which can then be used to provide a quick estimate of the join size of this table with other tables subject to dynamically specified predicate filter conditions, possibly specified over multiple columns (attributes) of each table. This algorithm makes a single pass over the data and is thus
more » ... uitable for streaming scenarios. We compare this algorithm analytically to two other previously known sampling approaches (independent Bernoulli Sampling and End-Biased Sampling) and to a novel sketch-based approach. We also compare these four algorithms experimentally and show that results fully correspond to our analytical predictions based on derived expressions for the estimator variances, with Correlated Sampling giving the best estimates in a large range of situations.
doi:10.14778/2824032.2824051 fatcat:uazef5jfsjcbzhdmnn6jvxte74