TopCrowd [chapter]

Christian Nieke, Ulrich Güntzer, Wolf-Tilo Balke
2014 Lecture Notes in Computer Science  
Building databases and information systems over data extracted from heterogeneous sources like the Web poses a severe challenge: most data is incomplete and thus difficult to process in structured queries. This is especially true for sophisticated query techniques like Top -k querying where rankings are aggregated over several sources. The intelligent combination of efficient data processing algorithms with crowdsourced database operators promises to alleviate the situation. Yet the scalability
more » ... of such combined processing is doubtful. We present TopCrowd, a novel crowd-enabled Top-k query processing algorithm that works effectively on incomplete data, while tightly controlling query processing costs in terms of response time and money spent for crowdsourcing. TopCrowd features probabilistic pruning rules for drastically reduced numbers of crowd accesses (up to 95%), while effectively balancing querying costs and result correctness. Extensive experiments show the benefit of our technique.
doi:10.1007/978-3-319-12206-9_10 fatcat:i6clz5fhejahtfhosmyvvtgh4e