LCS-TRIM: Dynamic Programming Meets XML Indexing and Querying

Shirish Tatikonda, Srinivasan Parthasarathy, Matthew Goyder
2007 Very Large Data Bases Conference  
In this article, we propose a new approach for querying and indexing a database of trees with specific applications to XML datasets. Our approach relies on representing both the queries and the data using a sequential encoding and then subsequently employing an innovative variant of the longest common subsequence (LCS) matching algorithm to retrieve the desired results. A key innovation here is the use of a series of inter-linked early pruning steps, coupled with a simple index structure that
more » ... able us to reduce the search space and eliminate a large number of false positive matches prior to applying the more expensive LCS matching algorithm. Additionally, we also present mechanisms that enable the user to specify constraints on the retrieved output and show how such constraints can be pushed deep into the retrieval process, leading to improved response times. Mechanisms supporting the retrieval of approximate matches are also supported. When compared with state-of-the-art approaches, the query processing time of our algorithms is shown to be up to two to three orders of magnitude faster on several real datasets on realistic query workloads. Finally, we show that our approach is suitable for emerging multi-core server architectures when retrieving data for more expensive queries.
dblp:conf/vldb/TatikondaPG07 fatcat:q55lb7e45vdm3ijxkwk2gheyza