Discovering OLAP dimensions in semi-structured data

Svetlana Mansmann, Nafees Ur Rehman, Andreas Weiler, Marc H. Scholl
2014 Information Systems  
OLAP cubes are obtained from the input data based on the available attributes and known relationships between them. Transforming the input into a set of measures distributed along a set of uniformly structured dimensions may be unrealistic for applications dealing with semi-structured data. We propose to extend the capabilities of OLAP via contentdriven discovery of measures and dimensions in the original dataset. New elements are discovered by means of data mining and other techniques and are
more » ... herefore expected to be of limited temporal validity. In this work we focus on the challenge of generating, maintaining, and querying such discovered elements of the cube. We demonstrate the power of our approach by providing OLAP to the public stream of user-generated content provided by Twitter. We were able to enrich the original set with dynamic characteristics such as user activity, popularity, messaging behavior, as well as to classify messages by topic, impact, origin, method of generation, etc. Knowledge discovery techniques coupled with human expertise enable structural enrichment of the original data beyond the scope of the existing methods for obtaining multidimensional models from relational or semi-structured data.
doi:10.1016/ fatcat:377h5bhyvvfg7j5j6cbgo7xh3a