A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
An Extraction Method of an Informative DOM Node from a Web Page by Using Layout Information
Transactions of the Japanese society for artificial intelligence
We propose an informative DOM node extraction method from a Web page for preprocessing of Web content mining. Our proposed method LM uses layout data of DOM nodes generated by a generic Web browser, and the learning set consists of hundreds of Web pages and the annotations of informative DOM nodes of those Web pages. Our method does not require large scale crawling of the whole Web site to which the target Web page belongs. We design LM so that it uses the information of the learning set moredoi:10.1527/tjsai.25.742 fatcat:q567uf3vsvaxddsgpw2aahqpmu