A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2011; you can also visit the original URL.
The file type is application/pdf
.
Title extraction from bodies of HTML documents and its application to web page retrieval
2005
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, in reality HTML titles are often bogus. It is desirable to conduct automatic extraction of titles from the bodies of HTML documents. This is an issue which does not seem to have been investigated previously. In this paper, we take a supervised machine learning approach to address the problem. We propose a specification
doi:10.1145/1076034.1076079
dblp:conf/sigir/HuXSHSCL05
fatcat:d5o32sdrlfcgln3d23hhozw2ca