Using the web infrastructure to preserve web pages

Michael L. Nelson, Frank McCown, Joan A. Smith, Martin Klein
2007 International Journal on Digital Libraries  
To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the "living web" and placing them in an archive for controlled curation. Once inside an archive, the resources are subject to careful processes of refreshing (making additional copies to new media) and migrating (conversion to new formats and applications). For small numbers of resources of known value, this is a practical and worthwhile approach to digital
more » ... However, due to the infrastructure costs (storage, networks, machines) and more importantly the human management costs, this approach is unsuitable for web scale preservation. The result is that difficult decisions need to be made as to what is saved and what is not saved. We provide an overview of our ongoing research projects that focus on using the "web infrastructure" to provide preservation capabilities for web pages and examine the overlap these approaches have with the field of information retrieval. The common characteristic of the projects is they creatively employ the web infrastructure to provide shallow but broad preservation capability for all web pages. These approaches are not intended to replace conventional archiving approaches, but rather they focus on providing at least some form of archival capability for the mass of web pages
doi:10.1007/s00799-007-0012-y fatcat:5ufnwywctfbrrheo65zxqrz22q