Capsule Reviews

2005 Computer journal  
Hashing of databases with the use of metric properties of the hamming space. V. Balakirsky Hashing of databases is a particular approach to the storage of a collection of items and the retrieval of those items of the collection whose key values match given key values. The key value of an item determines the address for the storage of that item. Collisions occur when different keys have the same addresses. Since there is a trade-off between cost of storage and fast access time and since main
more » ... ries usually have fast access time and a size limited by increasing cost, databases are stored in a secondary memory with slow access. The number of required access can be reduced if the values of a hash function are stored in the main memory, the records of a database are stored in an external memory and a working memory is used for storing pre-computed values of the hash function. This paper aims at solving the following task: 'given a pattern and a fixed size of working memory, form the set of addresses of records that can disagree with the pattern in the number of positions smaller than the given threshold value'. The paper uses the metric properties of the Hamming space for searching procedures. The author shows that the triangle inequality for Hamming distances generates a rejection rule for the records to be included in the subject of records that can be close to the given pattern. This rejection rule is exploited in the given hashing algorithm. An estimation of the performance of the hashing algorithm is given.
doi:10.1093/comjnl/bxh069 fatcat:7mgwvsfdbnc3tjhkyvjhr3w7by