A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Intelligent and Adaptive Crawling of Web Applications for Web Archiving
[chapter]
2013
Lecture Notes in Computer Science
Web sites are dynamic in nature with content and structure changing overtime. Many pages on the Web are produced by content management systems (CMSs) such as WordPress, vBulletin, or phpBB. Tools currently used by Web archivists to preserve the content of the Web blindly crawl and store Web pages, disregarding the CMS the site is based on (leading to suboptimal crawling strategies) and whatever structured content is contained in Web pages (resulting in page-level archives whose content is hard
doi:10.1007/978-3-642-39200-9_26
fatcat:5wpkayn6izdg7iuvnhple7jpni