Dynamic Space Efficient Hashing

Tobias Maier, Peter Sanders
2017 14 Leibniz International Proceedings in Informatics Schloss Dagstuhl-Leibniz-Zentrum für Informatik   unpublished
We consider space efficient hash tables that can grow and shrink dynamically and are always highly space efficient, i.e., their space consumption is always close to the lower bound even while growing and when taking into account storage that is only needed temporarily. None of the traditionally used hash tables have this property. We show how known approaches like linear probing and bucket cuckoo hashing can be adapted to this scenario by subdividing them into many subtables or using virtual
more » ... ory overcommitting. However, these rather straightforward solutions suffer from slow amortized insertion times due to frequent reallocation in small increments. Our main result is DySECT (Dynamic Space Efficient Cuckoo Table) which avoids these problems. DySECT consists of many subtables which grow by doubling their size. The resulting inhomogeneity in subtable sizes is equalized by the flexibility available in bucket cuckoo hashing where each element can go to several buckets each of which containing several cells. Experiments indicate that DySECT works well with load factors up to 98%. With up to 2.7 times better performance than the next best solution. 1 Introduction Dictionares represented as hash tables are among the most frequently used data structures and often play a critical role in achieving high performance. Having several compatible implementations, which perform well under different conditions and can be interchanged freely, allows programmers to easily adapt known solutions to new circumstances. One aspect that has been subject to much investigation is space efficiency [3, 4, 7, 8, 17, 19]. Modern space efficient hash tables work well even when filled to 95% and more. To reach filling degrees like this, the table has to be initialized with the correct final capacity, thereby, requiring programmers to know tight bounds on the maximum number of inserted elements. This is typically not realistic. For example, a frequent application of hash tables aggregates information about data elements by their key. Whenever the exact number of unique keys is not known a priori, we have to overestimate the initial capacity to guarantee good performance. Dynamic space efficient data structures are necessary to guarantee both good performance and low overhead independent of the circumstances.