Aalborg Universitet Aspects of data modeling and query processing for complex multidimensional data Aspects of Data Modeling and Query Processing for Complex Multidimensional Data Aspects of Data Modeling and Query Processing for Complex Multidimensional Data Project Title The Clinical Data Warehouse

Torben Pedersen, Bach, Torben Pedersen, Torben Pedersen
unpublished
This thesis is about data modeling and query processing for complex multidimensional data. Multidimensional data has become the subject of much attention in both academia and industry in recent years, fueled by the popularity of data warehousing and On-Line Analytical Processing (OLAP) applications. One application area where complex multidimensional data is common is within medical informatics, an area that may benefit significantly from the functionality offered by data warehousing and OLAP.
more » ... owever, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This thesis presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of complex data. OLAP systems typically employ multidimensional data models to structure their data. This thesis identifies eleven modeling requirements for multidimensional data models. These requirements are derived from a realistic assessment of complex data found in real-world applications. A survey of twelve multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, do not have built-in mechanisms for handling change and time, lack support for imprecision, and are unable to insert data with varying granularities. Additionally, most of the models do not support irregular dimension hierarchies and aggregation semantics. This thesis defines an extended multidimensional data model and algebraic query language that address all eleven requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The presented data model and query evaluation techniques can be implemented using relational database technology. The approach is also capable of exploiting multidimensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full pre-aggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This thesis significantly extends the scope of practical pre-aggregation to
fatcat:kjbc75rv7vcjhidlc4g65lzbui