Agreement Based Source Selection for the Multi-Domain Deep Web Integration

Manishkumar Jha, Raju Balakrishnan, Subbarao Kambhampati
2011 International Conference on Management of Data  
One immediate challenge in searching the deep web databases is source selection-i.e. selecting the most relevant web databases for answering a given query. For open collections like the deep web, the source selection must be sensitive to trustworthiness and importance of sources. Recent advances solve these problems for a single topic deep web search adapting an agreement based approach (c.f. SourceRank [10] ). In this paper we introduce a source selection method sensitive to trust and
more » ... e for multi topic deep web search. We compute multiple quality scores of a source tailored to different topics, based on the topic specific crawl data. At the query time, we classify the query to determine its probability of membership in different topics. These fractional memberships are used as the weights to the topic specific quality scores of sources to select sources for the query. Extensive experiments on more than a thousand sources in multiple topics show 18-85% improvements in result quality over Google Product Search and other existing methods 1 .
dblp:conf/comad/JhaBK11 fatcat:75qceeszrzgrtpxqlxzdxyjcwe