A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2008; you can also visit the original URL.
The file type is application/pdf
.
NET – A System for Extracting Web Data from Flat and Nested Data Records
[chapter]
2005
Lecture Notes in Computer Science
This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the
doi:10.1007/11581062_39
fatcat:n7cv3a5aqfhlrgb3hgz2eit5nu