Caching multidimensional queries using chunks

Prasad M. Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton
1998 Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98  
Caching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries. Previous work on such caching has considered table level caching and query level caching. Table level caching is suitable for static schemes. On the other hand, Query level caching can be used in dynamic schemes, but is too coarse for "large" query results. Query level caching has the further drawback for small query results in that it is only effective when a new query is
more » ... when a new query is subsumed by a cached previous query. In this paper, we propose caching small regions of the multidimensional space called "chunks". Chunk-based caching allows fine granularity caching, and also allows queries to partially reuse the results of previous queries with which they overlap. To facilitate the computation of chunks required by a query but not found in the cache, we propose a new organization for relational tables, which we call a "chunked file." Our experiments show that for workloads that exhibit query locality, chunked caching combined with the chunked file organization performs better than query level caching. An unexpected benefit of the chunked file organization is that, due to its multidimensional clustering properties, it can significantly improve the performance of queries that "miss" the cache entirely as compared to traditional file organizations. Introduction OLAP systems are becoming increasingly significant in modern day business in order to increase competitiveness. A typical characteristic of data sets in these systems is their multidimensional nature. However, traditional relational systems are not designed to provide the necessary performance for these types of data. Hence such systems are built by using a three tier architecture. The first tier provides an easy to use graphical tool that allows 1 the user to build requests. The middle tier provides a multidimensional view of the data stored in the final tier, which may be an RDBMS. Queries that occur in OLAP systems are interactive and demand quick response time in spite of being complex. The set of queries that commonly occur include placing restrictions on dimension tables that translate into restrictions on the fact table followed by an aggregation. Various techniques can be used at different stages of the life-time of the query to speed up its execution. Precomputation and the use of specialized indexing structures have been predominantly used at the RDBMS to speed up such queries. In this paper, we propose caching query results in the middle tier as a feasible approach that complements the other strategies. As a motivating example, consider a database containing sales data describing the dollar sales of products sold in a given region on a given date. Thus the schema could be described as:
doi:10.1145/276304.276328 dblp:conf/sigmod/DeshpandeRSN98 fatcat:ihpqy7y27ng7pf2zmmc3fgwmg4