A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Automatic identification of informative sections of Web pages
2005
IEEE Transactions on Knowledge and Data Engineering
Web-pages -especially dynamically generated ones -contain several items that cannot be classified as the "primary content", e.g., navigation sidebars, advertisements, copyright notices, etc. Most clients and end-users search for the primary content, and largely do not seek the non-informative content. A tool that assists an end-user or application to search and process information from Web-pages automatically, must separate the "primary content sections" from the other content sections. We call
doi:10.1109/tkde.2005.138
fatcat:2iataz2htvfjbkurg2sqpdkmxi