Efficient storage and query processing of XML data in relational database systems [thesis]

Prakash Sandeep
The popularity of XML has lead to a plethora of data that is less structured and does not follow a pre-defined schema. As a result, there is a growing need for data management systems that allow the storage and querying of such data. One avenue is the use of existing database technology. The goal of this thesis is to investigate relational storage approaches for XML data. To that end we present two schema-oblivious relational storage approaches -SUCXENT and SUCXENT++ that provide tremendous
more » ... ovement over existing schemaoblivious approaches. Among these two approaches SUCXENT++ presents a more significant improvement in performance. We also present algorithms that translate some types of XQuery queries into SQL based on the these two approaches. Existing literature indicates that schema-conscious approaches perform better than schema-oblivious approaches. Our experiments inidicate that though this may be true for most types of queries, schema-oblivious approaches (specifically SUCXENT++) perform better when it comes to recursive queries. In addition, the performance of such approaches is hindered by the inability of the relational query optimizer to generate optimal query plans. We propose optimizations to the XML query-to-SQL transaction process that overcome this problem and improve the performance of SUCXENT++ by up to 40 times. We also present a data partitioning strategy that utilizes the query workload to generate partitions which can be queried instead of the entire data set. This results in a performance improvement of up to 450 times. We also present a novel GUI-based query system for the visual formulation of XQuery queries. GUI-latency driven prefetching is employed to optimize query execution further resulting in performance improvements of up to 96%.
doi:10.32657/10356/2438 fatcat:biqv6lvpa5fq3j2bgwkdgkts54