37,449 Hits in 4.5 sec

How to Search the Internet Archive Without Indexing It [chapter]

Nattiya Kanhabua, Philipp Kemkes, Wolfgang Nejdl, Tu Ngoc Nguyen, Felipe Reis, Nam Khanh Tran
2016 Lecture Notes in Computer Science  
In addition, we link retrieved results to the WayBack Machine; thus allowing keyword search on the Internet Archive without processing and indexing its raw archived content.  ...  In this paper, we propose an entity-oriented search system to support retrieval and analytics on the Internet Archive. We use Bing to retrieve a ranked list of results from the current web.  ...  CONCLUSION In this paper, we proposed a web archive search prototype that for the first time supports entity-oriented queries on the Internet Archive.  ... 
doi:10.1007/978-3-319-43997-6_12 fatcat:ryv5ur3plvbdzoka5ufmwejhay

A survey of web archive search architectures

Miguel Costa, Daniel Gomes, Francisco Couto, Mário Silva
2013 Proceedings of the 22nd International Conference on World Wide Web - WWW '13 Companion  
This survey provides an overview of web archive search architectures designed for time-travel search, i.e. full-text search on the web within a user-specified time interval.  ...  Web archives already hold more than 282 billion documents and users demand full-text search to explore this historical information.  ...  ACKNOWLEDGMENTS This work could not be done without the support of FCCN and its Portuguese Web Archive team. We thank FCT for its LASIGE and INESC-ID multi-annual support.  ... 
doi:10.1145/2487788.2488116 dblp:conf/www/CostaGCS13 fatcat:ntyatj3cyze3dmaxoxeokmgkzq

How Much of the Web Is Archived? [article]

Scott G. Ainsworth and Ahmed AlSum and Hany SalahEldeen and Michele C. Weigle and Michael L. Nelson
2013 arXiv   pre-print
While individual archives can be measured in terms of number of URIs, number of copies per URI, and intersection with other archives, to date there has been no answer to the question "How much of the Web  ...  We study the question by approximating the Web using sample URIs from DMOZ, Delicious, Bitly, and search engine indexes; and, counting the number of copies of the sample URIs exist in various public web  ...  We would like to thank Herbert Van de Sompel and Robert Sanderson from Los Alamos National Laboratory, and Kris Carpenter Negulescu and Bradley Tofel from Internet Archive for their positive comments and  ... 
arXiv:1212.6177v2 fatcat:gnffrucuvvdhvk6jdzexyxvcxe

Internet Reviews

Sara Amato
2019 College & research libraries news  
In te rn e t Re v ie w s S a ra A m a to , e d ito r Christian Science M on i to r. Access: h t t p : / / ww w  ...  Two notable newspapers without archives at their Web site are the New York Times and USA Today.  ...  By far the most important added feature is a searchable archive of fulltext articles. Unfortunately, there is no men tion of how far back the archive runs.  ... 
doi:10.5860/crln.58.6.423 fatcat:qdbsnxtstrff5ldllthlz5pnwe

Bringing our Internet Archive collection back home: A case study from the University of Mary Washington

Katherine Perdue
2016 Code4Lib Journal  
However, individual items uploaded to the Internet Archive are hard to treat as a collection. Full text searching can only be done within an item.  ...  The Internet Archive is a great boon to smaller libraries that may not have the resources to host their own digital materials.  ...  Additionally, Google uses sampling techniques that make it difficult to know how accurate the data is.  ... 
doaj:fd4531ea40a14567875bf6b754522616 fatcat:ezktuels6rhjrl36wflmwoacbu

How much of the web is archived?

Scott G. Ainsworth, Ahmed Alsum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson
2011 Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries - JCDL '11  
After experiencing this web time travel, the inevitable question that comes to mind is "How much of the Web is archived?"  ...  This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives  ...  We thank Herbert Van de Sompel and Robert Sanderson of Los Alamos National Laboratory and Kris Carpenter Negulescu and Bradley Tofel of the Internet Archive for their explanations and positive comments  ... 
doi:10.1145/1998076.1998100 dblp:conf/jcdl/AinsworthASWN11 fatcat:z5amqwwcb5f2pjlp32wmqi7gny

A fair history of the Web? Examining country balance in the Internet Archive

Mike Thelwall, Liwen Vaughan
2004 Library & Information Science Research  
The Archive's goal is to index the whole Web without making any judgments about which pages are worth saving.  ...  The potential importance of the Archive for longitudinal and historical Web research leads to the need to evaluate its coverage.  ...  The Internet Archive was queried for each chosen site to see whether at least one page was indexed in it. At the same time the earliest date for the site to appear in the Archive was identified.  ... 
doi:10.1016/j.lisr.2003.12.009 fatcat:rur7mtb7kbc27posduma46uxcy

A Comparison of Internet Resource Discovery Approaches

Michael F. Schwartz, Alan Emtage, Brewster Kahle, B. Clifford Neuman
1992 Computing Systems  
Understanding these relationships is important, because they address the degree to which the systems can be made to interoperate seamlessly, without the need for users to learn the details of each system  ...  In the past several years, the number and variety of resources available on the Internet have increased dramatically.  ...  Acknowledgements Schwartz was supported for this work in part by the National Science Foundation under grants DCR-8420944 and NCR-9105372, a grant from Sun Microsystems' Collaborative Research Program,  ... 
dblp:journals/csys/SchwartzEKN92 fatcat:gv524r5y5bbrhovnzojshq35hq

The Mysterious Disappearance of the White House Speech Archive

Richard Wiggins
1996 First Monday  
For a small portion of this year, an archive of White House speeches and its index disappeared.  ...  The White House has been one of the leading government agencies in the United States in using the Internet to publish and distribute information.  ...  We wanted to demonstrate how the speech archive works. The only problem is: the speech archive, and the index, had vanished!  ... 
doi:10.5210/fm.v1i2.472 fatcat:kwdyhgt2nfbyxkpbdc6xd6m45i

A framework for describing web repositories

Frank McCown, Michael L. Nelson
2009 Proceedings of the 2009 joint international conference on Digital libraries - JCDL '09  
In prior work we have demonstrated that search engine caches and archiving projects like the Internet Archive's Wayback Machine can be used to "lazily preserve" websites and reconstruct them when they  ...  We use the term "web repositories" for collections of automatically refreshed and migrated content, and collectively we refer to these repositories as the "web infrastructure".  ...  The Internet Archive strives to maintain an accurate snapshot of the Web as it existed when crawled. Therefore they archive each resource in the same format in which it was crawled.  ... 
doi:10.1145/1555400.1555456 dblp:conf/jcdl/McCownN09a fatcat:nfeba62pxnaethyhteopv3kpo4

Scalable Internet resource discovery: research problems and approaches

C. Mic Bowman, Peter B. Danzig, Udi Manber, Michael F. Schwartz
1994 Communications of the ACM  
These tools have become quite popular, and are helping to rede ne how people think about wide-area network applications.  ...  Over the past several years, a number of information discovery and access tools have been introduced in the Internet, including Archie, Gopher, Net nd, and WAIS.  ...  We thank Alan Emtage for providing us with the Archie logs that led to some of the results in Section 3.1. Panos Tsirigotis implemented the software needed for analyzing this data.  ... 
doi:10.1145/179606.179704 fatcat:3xetbufqgrdtbdqathbpthorre

Full-Text and URL Search Over Web Archives [chapter]

Miguel Costa
2021 The Past Web  
Without the possibility of exploring and exploiting the archived contents, web archives are useless.  ...  However, the value of web archives depends on their users being able to search and access the information they require in efficient and effective ways.  ...  These APIs specify how to search and access web archive collections automatically. Examples include the Internet Archive API , the API and the Memento Time Travel API .  ... 
doi:10.1007/978-3-030-63291-5_7 fatcat:ufpkd72mjbdhxickdyp2ugdjny

Internet resource discovery at the University of Colorado

M.F. Schwartz
1993 Computer  
An important problem in this environment is how to discover resources of interest, such as documents, network services, and people.  ...  In this paper we discuss a number of aspects of the resource discovery problem, and summarize results from efforts to address these problems carried out in the Networked Resource Discovery Project at the  ...  Acknowledgements This material is based upon work supported in part by the National Science Foundation under grants DCR-8420944, NCR-9105372, and NCR-9204853, a grant from Sun Microsystems' Collaborative  ... 
doi:10.1109/2.231273 fatcat:3jaw7qpqyrcedbqilvtpwrhtny

The Integration of Internet Resources into a Library's Special Subject Services – the Example of the History Guide of the State and University Library of Goettingen

Wilfried Enderle
2000 Liber Quarterly: The Journal of European Research Libraries  
This is in my view the real challenge librarians are faced with in integrating Internet resources into the functions and services of a research library.  ...  So, my main argument as a librarian is that we have to keep in mind what a research library basically has to do, that is to secure direct and permanent access to primary information items.  ...  I refer to this example not because I think it really convincing trying to archive the whole Internet. From a scholarly point of view it is not worth archiving.  ... 
doi:10.18352/lq.7606 fatcat:nodmmmtt65edfe4mehtskume5i

A theory of digital objects

Jannis Kallinikos, Aleksi Aaltonen, Attila Marton
2010 First Monday  
There is no way to discuss any particular technology without confronting the primary functional task it addresses (as we did here with the Internet Archive and Web document search).  ...  “Proving Web history: How to use the Internet Archive,” Journal of Internet Law , volume 9, number 8, pp. 3–9. Nadine Höchstötter and Dirk Lewandowski, 2009.  ... 
doi:10.5210/fm.v15i6.3033 fatcat:3jaoqyt4nzcubcl2el2bvdbghy
« Previous Showing results 1 — 15 out of 37,449 results