HTML2RSS

Tomoyuki Nanno, Manabu Okumura
2006 Proceedings of the 15th international conference on World Wide Web - WWW '06  
We present a system to automatically generate RSS feeds from HTML documents that consist of time-series items with date expressions, e.g., archives of weblogs, BBSs, chats, mailing lists, site update descriptions, and event announcements. Our system extracts date expressions, performs structure analysis of a HTML document, and detects or generates titles from the document.
doi:10.1145/1135777.1136022 dblp:conf/www/NannoO06a fatcat:dnj7w5gepfhdnkq7mxjase7yfq