Automatic information extraction from web pages

Budi Rahardjo, Roland H. C. Yap
2001 Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '01  
Many web pages have implicit structure. In this paper, we show the feasibility of automatically extracting data from web pages by using approximate matching techniques. This can be applied to generate automatic wrappers or to notify/display web page differences, web page change monitoring, etc.
doi:10.1145/383952.384071 dblp:conf/sigir/YapR01 fatcat:62jtag6phjaupc3t37a7a6uaoq