22,099 Hits in 4.1 sec

Efficient, automatic web resource harvesting

Michael L. Nelson, Joan A. Smith, Ignacio Garcia del Campo
2006 Proceedings of the eighth ACM international workshop on Web information and data management - WIDM '06  
There are two problems associated with conventional web crawling techniques: a crawler cannot know if all resources at a non-trivial web site have been discovered and crawled ("the counting problem") and  ...  We present the Apache module "mod oai", which can be used to address the counting problem by listing all valid URIs at a web server and efficiently discovering updates and additions on subsequent crawls  ...  CONCLUSIONS mod oai is a demonstration of combining complex object metadata formats with OAI-PMH for efficient web resource harvesting.  ... 
doi:10.1145/1183550.1183560 dblp:conf/widm/NelsonSC06 fatcat:k2b5z36gsncitp4cwvtpt6vrma

Integration of Non-OAI Resources for Federated Searching in DLIST, an Eprints Repository

Anita Coleman, Paul Bracke, S. Karthik
2004 D-Lib Magazine  
The Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH) is designed to facilitate searches across OAIcompliant databases.  ...  Software such as Arc allow service providers to offer federated searching of multiple, OAIcompliant resources. The majority of webaccessible information resources, however, are not OAIcompliant.  ...  It must be possible to schedule harvesting of nonOAI resources regularly and automatically. 3. The process developed must be transferable.  ... 
doi:10.1045/july2004-coleman fatcat:6ezk2nsikfh35mjjdfvquo5ab4

Deep Web Interface Completely Harvested and Reranked by Crawler

Amruta Pandit,, Prof.Manisha Naoghare
2016 International Journal of Innovative Research in Computer and Communication Engineering  
For harvesting deep web interface problem proposed framework is used and the Parsing process takes place.  ...  Here experimental result on a set of representative domain show the accuracy of this proposed crawler framework which can efficiently retrieves web interface from large scale sites.  ...  However, there are large amount of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue.  ... 
doi:10.15680/ijircce.2016.0410005 fatcat:3fk5vk4g6jg3hb2fpncmbkxi6e

Evaluation of Semi-Automatic Metadata Generation Tools: A Survey of the Current State of the Art

Jung-ran Park, Andrew Brenza
2015 Information Technology and Libraries  
One promising approach to managing the ever--increasing amount of information is with semi-automatic metadata generation tools. Semi--automatic metadata generation tools Jung--ran Park  ...  The greatest area of difficulty lies in the fact that the piecemeal development of most semi--automatic generation tools only addresses part of the issue of semi--automatic metadata generation, providing  ...  The harvested metadata can be stored in a central repository for future resource retrieval.  ... 
doi:10.6017/ital.v34i3.5889 fatcat:ee6cptjz2vh4vc3er32zksjgji

Computer supported workflow for cataloging and management in digital libraries

Weimao Ke, Javed Mostafa, Gayathri S. Athreya
2009 Proceedings of the American Society for Information Science and Technology  
Features of the current implementation include automatic bioinformatics resource harvesting, basic information retrieval operations, resource cataloging and management, visualization and learning tools  ...  It computerizes a cataloging workflow by integrating backend services and facilitates interleaving between automatic cataloging & human curation activities.  ...  Digital libraries of this kind should not only pull together resources but also provide an efficient way of cataloging and managing these resources.  ... 
doi:10.1002/meet.2008.1450450304 fatcat:zptdgesatvbf5l65z7v3sdztii

Katsir: A Framework for Harvesting Digital Libraries on the Web

Uri Hanani, Ariel J. Frank
2000 European Conference on Information Systems  
Digital libraries, on the other hand, provide better services for focused discovery of relevant Web resources.  ...  The 'Katsir/Harvest' project laid the ground for our understanding that a new paradigm should to be developed -the Harvested Digital Library (HDL).  ...  Summarizer-Broker: • Intelligent information extraction from Web resources, thus giving more meaningful summaries. • A semi-automatic construction of HDL metadata structures such as a topics-tree and a  ... 
dblp:conf/ecis/HananiF00 fatcat:viqci7l2kjgntaukzcmhfa6fge

Review of web-based intelligent building system

2016 International Journal of Latest Trends in Engineering and Technology  
Web Resources may contain links to other resources and to build a distributed web between Internet endpoints, resulting in highly scalable and flexible architecture.  ...  Using configuration software Web access application the building control system provides complete automatic adjustment, automatic monitoring, automatic alarm function and self-diagnosis, etc.  ...  The most importantly, protect valuable natural resources by reducing electrical energy consumption.  ... 
doi:10.21172/1.81.003 fatcat:mrwvmlh2nbbqrgwpras342k3nq

Towards an Open Learning Infrastructure for Open Educational Resources: Abundance as a Platform for Innovation [chapter]

Erik Duval, Katrien Verbert, Joris Klerkx
2011 Lecture Notes in Computer Science  
By removing friction between people and resources, we can leverage the long tail of learning resources, so that the abundance of learning resources will act as a platform for innovation.  ...  This paper explains how we have contributed to the development of an open learning infrastructure that manages and makes available Open Educational Resources.  ...  Typically, for those targets, it does not make sense to harvest all of their resources.  ... 
doi:10.1007/978-3-642-19391-0_11 fatcat:pgtvwxupwvgujgjsasj7azfy5m


R. Uebbing, C. Xie, B. Beshah, J. Welter
2012 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
But the data volume alone is not the only barrier for an efficient production work flow.  ...  Ideally, users will interact with the system to retrieve a specific project status and summaries while the work flow processes are triggered automatically by modeling their dependencies.  ...  The different harvesters in return use metadata harvesters registered in the system to provided metadata from diverse sources for each resource being harvested into the catalogue.  ... 
doi:10.5194/isprsarchives-xxxix-b4-255-2012 fatcat:wmv7ubepb5fm3nywr5xc6tugcu

Implementing a Registry Federation for Materials Science Data Discovery

Raymond L. Plante, Chandler A. Becker, Andrea Medina-Smith, Kevin Brady, Alden Dima, Benjamin Long, Laura M. Bartolo, James A. Warren, Robert J. Hanisch
2021 Data Science Journal  
As a result of a number of national initiatives, we are seeing rapid growth in the data important to materials science that are available over the web.  ...  A resource registry collects high-level metadata descriptions of resources such as data repositories, archives, websites, and services that are useful for data-driven research.  ...  Figure 3 3 Resource Registration Form. (left) A portion of the form used to create a resource description and which is automatically generated from the schema.  ... 
doi:10.5334/dsj-2021-015 pmid:34795758 pmcid:PMC8596377 fatcat:yemivdjc6jbsvla3qc2sgmhagi

Towards Semantic Web-Based Information Retrieval to solve Information Overload in an Applied Gaming Ecosystem

Philippe Tamla
2019 Bulletin of IEEE Technical Committee on Digital Libraries  
Others have used Semantic Web Technologies to enrich web resources with additional facts and meaning, but have let the system (and not the end-user) decide about the relevance of the search result failing  ...  To solve IO, existing contributors have used complicated mechanisms that index the Web and harvest large piles of documents to detect relevant materials.  ...  Automated knowledge extraction can help tackle IO because it can retrieve useful knowledge (in form of named entities) from naturally written texts automatically, which can be used for efficiently searching  ... 
dblp:journals/tcdl/Tamla19 fatcat:ktxot2xryzcjzccty26uhjki6a

Managing Open Educational Resources on the Web of Data

Gilbert Paquette, Alexis Miara
2014 International Journal of Advanced Computer Science and Applications  
First, within MOOCs, all (or at least most) resources must be open and available on the Web through URIs, including the MOOCs themselves.  ...  In the last few years, the international work on Massive Open On-line Courses (MOOCS) underlined new needs for open educational resources (OER) management within the context of the Web of Data.  ...  Most of the time, however, the metadata records will be harvested automatically by either an OAI-PMH Harvester or a HTML Spider.  ... 
doi:10.14569/ijacsa.2014.050806 fatcat:ak24qyjhfngujp2rqv3bn47sfe

A Streaming Real-Time Web Observatory Architecture for Monitoring the Health of Social Machines

Ramine Tinati, Xin Wang, Ian Brown, Thanassis Tiropanis, Wendy Hall
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15 Companion  
In this paper we describe the architecture used in the Southampton Web Observatory to harvest, process, and serve real-time Web streams.  ...  Over the past years, streaming Web services have become popular, with many of the top Web platforms now offering near real-time streams of user and machine activity.  ...  Metadata describing the listed resources and projects are published. This way, descriptions of resources can be harvested and listed in other Web Observatories or Web-based resources.  ... 
doi:10.1145/2740908.2743977 dblp:conf/www/TinatiWBTH15 fatcat:bkctdj663vfpplxqrh7sly2xiy

Enhance Crawler: A Dual-Stage Crawler for Efficiently Harvesting Deep Web Interfaces

Sujata R., Shubhangi S.
2017 International Journal of Computer Applications  
We propose an effective hidden web harvesting framework, namely Smart Crawler, for achieving both wide coverage and high efficiency for a Form focused crawler.  ...  Due to very high amount of web resources and dynamic nature of web achieving the wide coverage and high efficiency is become challenging issue. So for achieving this we proposed "Enhance crawler".  ... 
doi:10.5120/ijca2017914483 fatcat:ihw4b434djag3m6uwbgi73fiua

Bolegweb Platform – Contribution to the Web Communities

T. Kliment, V. Cetl, M. Tuchyňa, M. Kliment, G. Bordogna
2016 Agris on-line Papers in Economics and Informatics  
Therefore, a crosswalk needs to be implemented to bridge the OGC resources discovered on mainstream web with those documented by metadata in an SDI to enrich its information extent.  ...  The paper reports a global wide and user friendly platform of OGC resources available on the web with the main goal to ensure and enhance the use of GI within a multidisciplinary context and to bridge  ...  However, for a non GIS expert it is not easy to understand how to search, find and efficiently use GI resources provided by OGC services.  ... 
doi:10.7160/aol.2016.080408 fatcat:respjz5ldzfvviiyri43zuquee
« Previous Showing results 1 — 15 out of 22,099 results