One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees

Jinglin Peng, Bolin Ding, Jiannan Wang, Kai Zeng, Jingren Zhou
2022 Proceedings of the 2022 International Conference on Management of Data  
Sample-based estimation, which uses a sample to estimate population parameters (e.g., SUM, COUNT, and AVG), has various applications in database systems. A sampler defines how samples are drawn from a population. Various samplers have been proposed (e.g., uniform sampler, stratified sampler, and measure-biased sampler), since there is no single sampler that works well in all cases. To overcome the "one size does not fit all" challenge, we study how to combine multiple samplers to estimate
more » ... tion parameters, and propose SamComb, a novel bandit-based sampler combination framework. Given a set of samplers, a budget, and a population parameter, SamComb can automatically decide how much budget should be allocated to each sampler so that the combined estimation achieves the highest accuracy. We model this sampler combination problem as a multi-armed bandit (MAB) problem and propose effective approaches to balance the exploration and exploitation trade-off in a principled way. We provide theoretical guarantees for our approaches and conduct extensive experiments on both synthetic and real datasets. The results show that there is a strong need to combine multiple samplers, in order to obtain accurate estimations without the knowledge about population predicates and distributions, and SamComb is an effective framework to achieve this goal. CCS CONCEPTS • Information systems → Database query processing; Online analytical processing engines.
doi:10.1145/3514221.3517900 fatcat:4mdxuqo2yjfcjeh775ywvcxu5m