Filters








106 Hits in 2.8 sec

Finding pages on the unarchived Web

Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Thaer Samar, Arjen P. de Vries
2014 IEEE/ACM Joint Conference on Digital Libraries  
Our main findings are threefold. First, the crawled Web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of the Web archive.  ...  Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived Web: in a known-item search setting we can retrieve these pages within the first ranks on average  ...  Acknowledgments Part of this paper is based on an initial report on uncovering and characterizing unarchived pages, published as [25] .  ... 
doi:10.1109/jcdl.2014.6970188 dblp:conf/jcdl/HuurdemanBKSV14 fatcat:rya7otftlvdqhlqp7rtb7kp3u4

Lost but not forgotten: finding pages on the unarchived web

Hugo C. Huurdeman, Jaap Kamps, Thaer Samar, Arjen P. de Vries, Anat Ben-David, Richard A. Rogers
2015 International Journal on Digital Libraries  
Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived web: in a known-item search setting we can retrieve unarchived web pages within the first ranks  ...  Our main findings are the following: First, the crawled web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of a Web archive.  ...  The link extraction and analysis work was carried out on the Dutch national e-infrastructure with the support of SURF Foundation.  ... 
doi:10.1007/s00799-015-0153-3 fatcat:f5yhxhrdxjduznbnxamlcjvacm

Uncovering the unarchived web

Thaer Samar, Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Arjen de Vries
2014 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14  
to their mentions on pages that were included in the archived web collection.  ...  We illustrate empirically that the size of the aura can be substantial: in 2012, the Dutch Web archive contained 12.3M unique pages, while we uncover references to 11.9M additional (unarchived) pages.  ...  On the one hand, every web archive is incomplete, since, depending on the settings of the crawler, many pages it encounters are excluded from archiving.  ... 
doi:10.1145/2600428.2609544 dblp:conf/sigir/SamarHBKV14 fatcat:2ug2mk4txjhcvfbanmdrrpgh7a

Who and what links to the Internet Archive

Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson
2014 International Journal on Digital Libraries  
We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web.  ...  About 65% of the requested archived pages no longer exist on the live web.  ...  Acknowledgment This work was supported in part by the NSF (IIS 1009392) and the Library of Congress.  ... 
doi:10.1007/s00799-014-0111-5 fatcat:rn2ux7gyenaglk5kpqfj5sag6a

Who and What Links to the Internet Archive [chapter]

Yasmin Alnoamany, Ahmed Alsum, Michele C. Weigle, Michael L. Nelson
2013 Lecture Notes in Computer Science  
We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web.  ...  About 65% of the requested archived pages no longer exist on the live web.  ...  Acknowledgment This work was supported in part by the NSF (IIS 1009392) and the Library of Congress.  ... 
doi:10.1007/978-3-642-40501-3_35 fatcat:76xjfgs7onglrbdfgkxfqhd7gq

Adaptive Search Support for Information Seeking Stages

Hugo C. Huurdeman
2015 Bulletin of IEEE Technical Committee on Digital Libraries  
In addition to the wealth of information available on the live Web, historical Web content is currently available in Web archives, containing snapshots of the Web that once was.  ...  The understanding on both the theoretical and practical level are used to design and evaluate multistage search systems, firstly in a general Web search setting, and secondly in a Web archive search setting  ...  Acknowledgments We gratefully acknowledge the feedback received at the doctoral consortium of the 2014 ACM/IEEE Digital Libraries conference, and are thankful for the received travel support grant.  ... 
dblp:journals/tcdl/Huurdeman15 fatcat:hopnyxjnhjclrecgopmwdh6bsa

Guest editors' introduction to the special issue on the digital libraries conference 2014

Martin Klein, Andreas Rauber
2015 International Journal on Digital Libraries  
The paper "Lost but Not Forgotten: Finding Pages on the Unarchived Web" by Hugo C. Huurdeman, Jaap Kamps, Thaer Samar, Arjen P. de Vries, Anat Ben-David, and Richard A.  ...  Not all embedded resources are equally important as their impact on the web page may vary.  ... 
doi:10.1007/s00799-015-0161-3 fatcat:e6sihqmmqveoljeoze7p5srpjy

Adapting the Hypercube Model to Archive Deferred Representations and Their Descendants [article]

Justin F. Brunelle and Michele C. Weigle and Michael L. Nelson
2016 arXiv   pre-print
Web pages are increasingly interactive, resulting in pages that are increasingly difficult to archive.  ...  It is difficult to archive all of the resources in deferred representations and the result is archives with web pages that are either incomplete or that erroneously load embedded resources from the live  ...  Mesbah et al. performed several experiments regarding crawling and indexing representations of web pages that rely on JavaScript [34, 31] focusing mainly on search engine indexing and automatic testing  ... 
arXiv:1601.05142v1 fatcat:alixiq6e3rd4tgeo6t35aq6aba

Book Review. Web 2.0 Tools and Strategies for Archives and Local History Collections

Gabrielle Prefontaine
2011 Partnership: The Canadian Journal of Library and Information Practice and Research  
Kate Theimer's blog (www.archivesnext.com) forms the core of a nexus of resources and initiatives, including the Archivists-on-Twitter Daily and the Best Archives on the Web Awards.  ...  the impact and success of creating Facebook pages; blogging; twittering; podcasting and Youtubing.  ... 
doi:10.21083/partnership.v6i1.1470 fatcat:qiqrepvfirfajmqfcpp3unselm

Counter-archiving Facebook

Anat Ben-David
2020 European Journal of Communication  
Following recent debates on data colonialism, it argues that Facebook dialectically assumes a role of a new archon of public records, while being unarchivable by design.  ...  The article concludes by discussing the shifting boundaries between the archivist, the activist and the scholar, as the imperative of research methods after datafication.  ...  Conclusion In the previous sections, I contextualized Facebook's unarchivability on one hand, and its tight control on the types of data it makes public on the other, in wider discussions on the critical  ... 
doi:10.1177/0267323120922069 fatcat:vkbn5daxzvb63gfe42w6c2l2xm

Filling in the Blanks: Archiving Dynamically Generated Content

Justin F. Brunelle
2012 Bulletin of IEEE Technical Committee on Digital Libraries  
Pages can be customized based on user preferences, user interaction, and other local events.  ...  Web 2.0 technologies are improving the average Web user's browsing experience by providing richer browsing.  ...  Specifically, the ability to capture previously unarchiveable content and to prevent archived resources from reaching into the live Web for content should be measured.  ... 
dblp:journals/tcdl/Brunelle12 fatcat:j7doqvfmobhydlz2fc3ghoxszi

Temporal Anchor Text as Proxy for Real User Queries

Thaer Samar, Arjen P. de Vries
2015 International Conference on Theory and Practice of Digital Libraries  
Web archives preserve the fast changing web. While we can archive the web pages, the popularity of queries in the past has usually not been preserved.  ...  Our approach is to rank anchor text based on their popularity in the archive at specific time. Then, we check the importance of the top ranked anchor text in the public Web at the same time.  ...  However, the Web is dynamic and data can be easily lost on the Web. Ntoulas et al. [28] found that 80% of Web pages are not available after one year.  ... 
dblp:conf/ercimdl/SamarV15 fatcat:obejw4w7q5a3xcapvnok7liyoq

Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web [article]

Hany M. SalahEldeen, Michael L. Nelson
2013 arXiv   pre-print
To mitigate the loss of resources on the live web, we propose the use of a "tweet signature".  ...  We also discovered that resources have disappeared from the archives themselves (7.89%) as well as reappeared on the live web after being declared missing (6.54%).  ...  Acknowledgments This work was supported in part by the Library of Congress and NSF IIS-1009392.  ... 
arXiv:1309.2648v1 fatcat:rfy63jhqnrdoram5wohsjannfa

Resurrecting My Revolution [chapter]

Hany M. Salaheldeen, Michael L. Nelson
2013 Lecture Notes in Computer Science  
To mitigate the loss of resources on the live web, we propose the use of a "tweet signature".  ...  We also discovered that resources have disappeared from the archives themselves (7.89%) as well as reappeared on the live web after being declared missing (6.54%).  ...  Acknowledgments This work was supported in part by the Library of Congress and NSF IIS-1009392.  ... 
doi:10.1007/978-3-642-40501-3_34 fatcat:6jrndyss7vgavlxsa2vpkae4sq

Requirements for software deployment languages and schema [chapter]

Richard S. Hall, Dennis Heimbigner, Alexander L. Wolf
1998 Lecture Notes in Computer Science  
The contribution of this paper is to define and explore the requirements for software deployment languages and schema.  ...  The result of these efforts has led to the creation of software description languages and schema, but they do not address deployment issues in a complete, systematic fashion.  ...  For an example, consider a software system that maintains an index on an evolving collection of Web pages.  ... 
doi:10.1007/bfb0053890 fatcat:7l7avvhqnbfmtcirpmiskrtc44
« Previous Showing results 1 — 15 out of 106 results