A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2005; you can also visit the original URL.
The file type is application/pdf
.
Extracting context to improve accuracy for HTML content extraction
2005
Special interest tracks and posters of the 14th international conference on World Wide Web - WWW '05
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "useful and relevant" content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, reducing noise for information retrieval systems and to generally improve the web browsing experience. In our previous work [16], we developed a framework that employed an easily
doi:10.1145/1062745.1062895
dblp:conf/www/GuptaKS05
fatcat:elkojmakmzchphz6pd2p4tfiru