A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is application/pdf
.
Sampling, information extraction and summarisation of Hidden Web databases
2006
Data & Knowledge Engineering
Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users' queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from databases. The proposed system, 2PS, is based on a two-phase framework for the sampling, extraction and
doi:10.1016/j.datak.2006.01.009
fatcat:ftdqsklu6zaglgpkjg2tyc75pq