The case for a wide-table approach to manage sparse relational data sets

Eric Chu, Jennifer Beckmann, Jeffrey Naughton
2007 Proceedings of the 2007 ACM SIGMOD international conference on Management of data - SIGMOD '07  
A "sparse" data set typically has hundreds or even thousands of attributes, but most objects have non-null values for only a small number of these attributes. A popular view about sparse data is that it arises merely as the result of poor schema design. In this paper, we argue that rather than being the result of inept schema design, storing a sparse data set in a single table is the right way to proceed. However, for this to be the case, RDBMSs must provide sparse data management facilities
more » ... t go beyond the previously studied requirement of storing such data sets efficiently. In particular, an RDBMS must 1) enable users to effectively build ad hoc queries over a very large number of attributes, and 2) support efficient evaluation of these queries over a wide, sparse table. We propose techniques that provide these capabilities, and argue that the single-table approach is a necessary component of selfmanaging database systems because it frees users from a tedious and potentially ineffective schema-design phase when managing sparse data sets.
doi:10.1145/1247480.1247571 dblp:conf/sigmod/ChuBN07 fatcat:cz34idr3dra2djuypm6mkhgarm