Durable top-k search in document archives

Leong Hou U, Nikos Mamoulis, Klaus Berberich, Srikanta Bedathur
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements
more » ... n a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.
doi:10.1145/1807167.1807228 dblp:conf/sigmod/UMBB10 fatcat:nhujkleyzvcmbmv44ow6qt5iie