Supporting OLAP operations over imperfectly integrated taxonomies

Yan Qi, K. Selçuk Candan, Junichi Tatemura, Songting Chen, Fenglin Liao
2008 Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08  
OLAP is an important tool in decision support. With the help of domain knowledge, such as hierarchies of attribute values, OLAP helps the user observe the effects of various decisions. One assumption of most OLAP operations is that the available domain knowledge is precise. In particular, they assume that the hierarchy of values over which the user can navigate forms a taxonomy. In this paper, we first note that when multiple heterogeneous data sources are involved in the gathering of the data
more » ... hering of the data and the associated domain knowledge, the integrated knowledge-base, constructed by combining locally available taxonomies based on the concept matchings, may not be a taxonomy itself. Specifically, existence of intersections among concepts from different sources compromises the tree-structure of the integrated taxonomy and prevents effective use of hierarchical navigation techniques, such as drill-down and roll-up. To cope with this, we introduce concept un-classification, where a select few of the concepts are eliminated to ensure that the remaining structure is a navigable taxonomy, without concept intersections. Since un-classifying an originally classified data is not desirable, we consider ways to minimize un-classification in the process. We introduce a cost model which captures the imprecision caused by the un-classification process and we formulate the problem of finding an un-classification strategy which eliminates intersections and which adds minimal imprecision to the resulting structure. We show that, when performed naively, this task can be very costly and thus we propose a bottom-up preprocessing strategy which supports basic navigational analytics operations, such as drill-down and roll-up. Experiments over synthetic and real-life data verified the effectiveness and efficiency of our approach.
doi:10.1145/1376616.1376703 dblp:conf/sigmod/QiCTCL08 fatcat:wdqbsjis2ndl7auqnjvmuaesde