Symbolic Gray Code as a Perfect Multiattribute Hashing Scheme for Partial Match Queries

C.C. Chang, R.C.T. Lee, M.W. Du
1982 IEEE Transactions on Software Engineering  
In this paper, we shall show that the symbolic Gray code hashing mechanism is not only good for best matching, but also good for partial match queries. Essentially, we shall propose a new hashing scheme, called bucket-oriented symbolic Gray code, which can be used to produce any arbitrary Cartesian product file, which has been shown to be good for partial match queries. Many interesting properties of this new multiattribute hashing scheme, including the property that it is a perfect hashing
more » ... me, have been discussed and proved. Index Terms-Bucket-oriented symbolic Gray code, Cartesian product file, multiattribute file organization, partial match query, perfect hashing, symbolic Gray code. I. THE PARTIAL MATCHING PROBLEM IN this paper, we are concerned with partial match query systems [1], [3], [5], [6], [10], [18]-[21], [23]. We assume that we are dealing with a multiattribute file consisting of a set of multiattribute records. Each record is characterized by attributes Al, A2, --,AN. A partial match query is a query of the following form: retrieve all records where Ai, = all, * * *, Aik = aik where O< k <N. We shall assume that all of the records are divided into buckets and stored in disks. Each time a partial match query is processed, one or more disk accesses are performed. Since the disk accessing is much more time-consuming than any other processing in the internal main memory, we shall measure the performance of our file system by the number of buckets necessary to be examined. Let us consider Tables I and II. In both tables, a query (A I = a, A2 = *) denotes a partial match query which retrieves all of the records with A equal to a and A 2 can be any value, i.e., a don't care condition. It can be seen that the average number of buckets to be examined, over all possible partial match queries, is 2 for the file system in Table II and 4 for file system in Table I . Thus, our multiattribute file system design problem for partlal match queries can be stated as follows: given a set of multiattribute records, the problem is to arrange the records in such a way that the average number of buckets to be examined, over all possible partial match queries, is minimized. Unfortunately, a solution to the above stated problem is still at large. In other words, given an arbitrary set of multiattribute records, there is no efficient algorithm to fimd an Manuscript
doi:10.1109/tse.1982.235253 fatcat:rcackokf3zf3bkczw2wemst3ki