Partial-Match Retrieval Algorithms

Ronald L. Rivest
1976 SIAM journal on computing (Print)  
We examine the efficiency of hash-coding and tree-search algorithms for retrieving from a file of k-letter words all words which match a partially-specified input query word (for example, retrieving all six-letter English words of the form S**R*H where "*" is a "don't care" character). We precisely characterize those balanced hash-coding algorithms with minimum average number of lists examined. Use of the first few letters of each word as a list index is shown to be one such optimal algorithm.
more » ... new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worst-case number of lists examined, yet with optimal average behavior maintained. Another efficient variant involves storing each word in several lists. Tree-search algorithms are shown to be approximately as efficient as hash-coding algorithms, on the average. In general, these algorithms require time about O(n <k-s)/k) to respond to a query word with s letters specified, given a file of n k-letter words. Previous algorithms either required time O(s n/k) or else used exorbitant amounts of storage. analysis of algorithms QUOTATIONS Oh where, oh where, has my little dog gone? Oh where, oh where can he be? With his tail cut short, and his ears cut long, Oh where, Oh where can he be? [Nursery rhyme] You must look where it is not, as well as where it is.
doi:10.1137/0205003 fatcat:m3z52uy4bvctxav4mosnuzo6qa