Near-Optimal Range Reporting Structures for Categorical Data [chapter]

Kasper Green Larsen, Freek van Walderveen
2013 Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms  
Range reporting on categorical (or colored) data is a well-studied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may be far smaller than the set of all points in the query range. We study two-dimensional categorical range reporting in both the word-RAM and I/O-model. For the I/O-model, we present two alternative
more » ... ta structures for three-sided queries. The first answers queries in op- where K is the number of distinct colors in the output, B is the disk block size, and lg * N is the iterated logarithm of N . Our second data structure uses linear space and answers queries in O(lg B N + lg (h) N + K/B) I/Os for any constant integer h ≥ 1. Here lg (1) N = lg N and lg (h) N = lg(lg (h−1) N ) when h > 1. Both solutions use only comparisons on the coordinates. We also show that the lg B N terms in the query costs can be reduced to optimal lg lg B U when the input points lie on a U × U grid and we allow word-level manipulations of the coordinates. We further reduce the query time to just O(1) if the points are given on an N × N grid. Both solutions also lead to improved data structures for four-sided queries. For the word-RAM, we obtain optimal data structures for three-sided range reporting, as well as improved upper bounds for four-sided range reporting. Finally, we show a tight lower bound on onedimensional categorical range counting using an elegant reduction from (standard) two-dimensional range counting.
doi:10.1137/1.9781611973105.20 dblp:conf/soda/LarsenW13 fatcat:rqie3ekeqbbgjchtoq7bitms5q