DBM-Tree: Trading height-balancing for performance in metric access methods
Journal of the Brazilian Computer Society
Metric Access Methods (MAM) are employed to accelerate the processing of similarity queries, such as the range and the k-nearest neighbor queries. Current methods, such as the Slim-tree and the M-tree, improve the query performance minimizing the number of disk accesses, keeping a constant height of the structures stored on disks (height-balanced trees). However, the overlapping between their nodes has a very high influence on their performance. This paper presents a new dynamic MAM called the
... BM-tree (Density-Based Metric tree), which can minimize the overlap between high-density nodes by relaxing the height-balancing of the structure. Thus, the height of the tree is larger in denser regions, in order to keep a tradeoff between breadth-searching and depth-searching. An underpinning for cost estimation on tree structures is their height, so we show a non-height dependable cost model that can be applied for DBM-tree. Moreover, an optimization algorithm called Shrink is also presented, which improves the performance of an already built DBM-tree by reorganizing the elements among their nodes. Experiments performed over both synthetic and real world datasets showed that the DBM-tree is, in average, 50% faster than traditional MAM and reduces the number of distance calculations by up to 72% and disk accesses by up to 66%. After performing the Shrink algorithm, the performance improves up to 40% regarding the number of disk accesses for range and k-nearest neighbor queries. In addition, the DBM-tree scales up well, exhibiting linear performance with growing number of elements in the database.