Efficient algorithms to execute complex similarity queries in RDBMS

Adriano S. Arantes, Marcos R. Vieira, Caetano Traina Jr., Agma J. M. Traina
2004 Journal of the Brazilian Computer Society  
Search operations in large sets of complex objects usually rely on similarity-based criteria, due to the lack of other general properties that could be used to compare the objects, such as the total order relationship, or even the equality relationship between pairs of objects, commonly used with data in numeric or short texts domains. Therefore, similarity between objects is the core criterion to compare complex objects. There are two basic operators for similarity queries: Range Query and
more » ... arest Neighbors Query. Much research has been done to develop effective algorithms to implement them as standalone operations. However, algorithms to support these operators as parts of more complex expressions involving their composition were not developed yet. This paper presents two new algorithms specially designed to answer conjunctive and disjunctive operations involving the basic similarity criteria, providing also support for the manipulation of tie lists when the k-Nearest Neighbor query is involved. The new proposed algorithms were compared with the combinations of the basic algorithms, both in the sequential scan and in the Slim-tree metric access methods, measuring the number of disk accesses, the number of distance calculations, and wall-clock time. The experimen-tal results show that the new algorithms have better performance than the composition of the two basic operators to answer complex similarity queries in all measured aspects, being up to 40 times faster than the composition of the basic algorithms. This is an essential point to enable the practical use of similarity operators in Relational Database Management Systems.
doi:10.1590/s0104-65002004000100002 fatcat:jeloa5lyybdwzfdz4ajgk25ina