Editors: Kirk Pruhs and Christian Sohler; Article No. 65

Thomas Schibler, Subhash Suri
2017 13 Leibniz International Proceedings in Informatics Schloss Dagstuhl-Leibniz-Zentrum für Informatik   unpublished
We study the problem of k-dominance in a set of d-dimensional vectors, prove bounds on the number of maxima (skyline vectors), under both worst-case and average-case models, perform experimental evaluation using synthetic and real-world data, and explore an application of k-dominant skyline for extracting a small set of top-ranked vectors in high dimensions where the full skylines can be unmanageably large. In multi-criteria optimization and decision-making applications, there is often no
more » ... e is often no single best answer, and a popular approach is to use pareto optimality. The set of pareto optimal points, which are the coordinate-wise undominated solutions, is called the skyline. Unfortunately as the dimension of the data grows, 1 the size of the skyline tends to explode and most, if not all, of the input vectors can appear on the skyline [4, 7, 8]. A database query for a car or a smart phone, for instance, can easily produce an overwhelming number of incomparable choices. In the National Basketball Association's (NBA) database of 21, 961 players in 17 dimensions (scoring attributes), more than 1400 players appear on the skyline (see Figure 2 for real-world datasets with large skylines). The problem is even more pronounced in crowdsourced data such as movies or consumer product ratings-each input vector is the rating profile of a product by the users-where virtually every product can be highly ranked by some user, potentially elevating it to the skyline. While the classical result of Bentley et al [4] shows that the expected size of the skyline of n random vectors, whose components are chosen independently, is O((log n) d−1) in d-dimensions, the exponential dependence on d renders the skyline useless even in theory except in very low dimensions. The k-dominant skyline KDS was introduced recently as a way to tame this curse of dimensionality, where by relaxing d-dominance to k-dominance, for k < d, many more points can be eliminated from the skyline, resulting in a smaller, more manageable, set of maxima. Formally, given a finite set of points V in R d , a point u is said to k-dominate another point 1 Although in theory all input points can appear on the skyline even in two dimensions, this pathological behavior is rarely observed in low-dimensional real-world data.