A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
A new wrapper induction algorithm WTM for generating rules that describe the general web page layout template is presented. WTM is mainly designed for use in weblog crawling and indexing system. Most weblogs are maintained by content management systems and have similar layout structures in all pages. In addition, they provide RSS feeds to describe the latest entries. These entries appear in the weblog homepage in HTML format as well. WTM is built upon these two observations. It uses RSS feeddblp:conf/pacis/ZhangZP06 fatcat:lspboso6ijfapnseltsffdzehu