Filters








159,761 Hits in 1.5 sec

Web Harvesting [chapter]

Serguei Mankovskii, Maarten van Steen, Minos Garofalakis, Alan Fekete, Christian S. Jensen, Richard T. Snodgrass, Alex Wun, Vanja Josifovski, Andrei Broder, Dennis Fetterly, Marc Najork, Robert Baumgartner (+55 others)
2009 Encyclopedia of Database Systems  
SYNONYMS web data extraction, web information extraction, web mining DEFINITION Web harvesting describes the process of gathering and integrating data from various heterogeneous web sources.  ...  The term harvesting implies that, while passing over a large body of available information, the process gathers only such information that lies in the domain of interest and is, as such, relevant.  ...  The important challenges for web harvesting, in contrast, lie in extracting and integrating the data.  ... 
doi:10.1007/978-0-387-39940-9_1172 fatcat:g57kzd22ozc2jndgawknid65nm

Crawler for Efficiently Harvesting Web

K Praveen Kumar
2017 International Journal of Communication Technology for Social Networking Services  
achieves higher harvest rates than different crawlers. tree organization to appreciate wider coverage for an internet web site.  ...  achieves higher harvest rates than utterly totally different crawlers.  ...  Propose a good harvesting framework for deep-web interfaces, specifically smartcrawler [3] .  ... 
doi:10.21742/ijctsns.2017.5.1.02 fatcat:7big4ws6yba4dfwjbk4kdubdbq

Harvesting maps on the web

Aman Goel, Matthew Michelson, Craig A. Knoblock
2010 International Journal on Document Analysis and Recognition  
Yet, finding a collection of diverse, high quality maps is a significant challenge because there is a dearth of content specific metadata available to identify them from among other images on the Web.  ...  In this work, we tackle the problem of building that corpus by harvesting maps from the Web.  ...  Improvements over our previous work We have done some preliminary work on harvesting maps from the Web [4] .  ... 
doi:10.1007/s10032-010-0136-2 fatcat:gbes3ej4qbhuhptg64oyhtww7e

Efficient, automatic web resource harvesting

Michael L. Nelson, Joan A. Smith, Ignacio Garcia del Campo
2006 Proceedings of the eighth ACM international workshop on Web information and data management - WIDM '06  
(DIDL) into the web server itself.  ...  We introduce an approach that solves these two problems by implementing support for both the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and MPEG-21 Digital Item Declaration Language  ...  However, web harvesting is different. We define U as the set of all possible URLs for a particular web server, and F as the set of files that the web server can see.  ... 
doi:10.1145/1183550.1183560 dblp:conf/widm/NelsonSC06 fatcat:k2b5z36gsncitp4cwvtpt6vrma

Efficient web harvesting strategies for monitoring deep web content

Mohammad Khelghati, Djoerd Hiemstra, Maurice van Keulen
2016 Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services - iiWAS '16  
In focused web harvesting, all documents matching a given entity are harvested by querying a web search engine.  ...  changed content on the web in the domain of focused web harvesting.  ... 
doi:10.1145/3011141.3011198 dblp:conf/iiwas/KhelghatiHK16 fatcat:7m73i2cy5ncwzbqxzbevn47wbu

Web Harvesting: A Technique for Fast Retrieval of Information from Web

Meenakshi Srivastava, Dr. S.K. Singh
2016 International Journal Of Engineering And Computer Science  
Web harvesting is also known as Web scraping. In this article we have explored the field of Web harvesting and emphasized its use for fast and effective retrieval of information from web  ...  Making this search process better and fast has always been the area of interest for researchers involved in web mining. The process of searching the web can be improved by Web harvesting.  ...  [I] INTRODUCTION Web Harvesting stands in name -Web that is Internet which is itself a whole world of information. Harvesting belongs to agriculture harvesting.  ... 
doi:10.18535/ijecs/v5i5.55 fatcat:zt2j2625fnd3le2hvmto3h3lj4

Harvesting Image Databases from the Web

F Schroff, A Criminisi, A Zisserman
2011 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Candidate images are obtained by a text based web search querying on the object identifier (the word penguin). The web pages and the images they contain are downloaded.  ...  A multi-modal approach employing both text, meta data and visual features is used to gather many, high-quality images from the web.  ...  Conclusion This paper has proposed an automatic algorithm for harvesting the web and gathering hundreds of images of a given query class.  ... 
doi:10.1109/tpami.2010.133 pmid:21330688 fatcat:5bsag2wfmbhgrlilytd5zqxjbq

Harvesting Image Databases from the Web

F. Schroff, A. Criminisi, A. Zisserman
2007 2007 IEEE 11th International Conference on Computer Vision  
Candidate images are obtained by a text based web search querying on the object identifier (the word penguin). The web pages and the images they contain are downloaded.  ...  A multi-modal approach employing both text, meta data and visual features is used to gather many, high-quality images from the web.  ...  Conclusion This paper has proposed an automatic algorithm for harvesting the web and gathering hundreds of images of a given query class.  ... 
doi:10.1109/iccv.2007.4409099 dblp:conf/iccv/SchroffCZ07 fatcat:boc6ufvmxjamjhefnexm5k6i4e

PathCrawler: Automatic harvesting web infra-structure

Cesar Marcondes, M.Y. Sanadidi, Mario Gerla, Ramon S. Schwartz, Raphael O. Santos, Magnos Martinello
2008 NOMS 2008 - 2008 IEEE Network Operations and Management Symposium  
describe estimation algorithms and the software architecture of an efficient network management suite to automatically mine path capacity and minimum delays from a venture point to a set of observed web  ...  EXPERIMENTS FOR HARVESTING WEB INFRA-STRUCTURE In this section, we describe an extensive "web infra-structure harvesting" campaign using the developed PathCrawler.  ...  A web server supports either persistent connections or not persistent.  ... 
doi:10.1109/noms.2008.4575153 dblp:conf/noms/MarcondesSGSSM08 fatcat:hbycrxgpezdqfapnh6ugs22krm

Harvesting models from web 2.0 databases

Oscar Díaz, Gorka Puente, Javier Luis Cánovas Izquierdo, Jesús García Molina
2011 Journal of Software and Systems Modeling  
However, MDE first requires obtaining models from the wiki/blog/website database (a.k.a. model harvesting). This can be achieved through SQL scripts embedded in a program.  ...  Microformats are often referred to as the grassroot approach to Semantic Web where RDFa is being proposed [10] .  ...  Harvesting Models Out of Databases Model harvesting out of a database (hereafter, just "model harvesting") requires to express how model elements can be obtained from an existing database.  ... 
doi:10.1007/s10270-011-0194-z fatcat:bdoxtw626nc7jebemkb7lfhnde

Water harvesting during orb web recycling

Brent D. Opell
2021 The journal of arachnology  
Estimates of net water gain range from a high of 0.88% of body mass (0.19 ll water harvested) in L. venusta to a low of 0.45% of body mass (3.01 ll water harvested) in A. marmoreus.  ...  Table 1 . 1 -Values used to estimate water harvested during orb web ingestion. Spider mass and capture thread length are from Opell (1999) and droplets per mm are from Opell & Hendricks (2009) .  ... 
doi:10.1636/joa-s-19-066 fatcat:v57ujazgvbaydcvozhffvyxwbm

Harvesting SSL Certificate Data to Identify Web-Fraud [article]

Mishari Al Mishari, Emiliano De Cristofaro, Karim El Defrawy, Gene Tsudik
2012 arXiv   pre-print
Web-fraud is one of the most unpleasant features of today's Internet. Two well-known examples of fraudulent activities on the web are phishing and typosquatting.  ...  This paper presents a novel technique to detect web-fraud domains that utilize HTTPS. To this end, we conduct the first comprehensive study of SSL certificates.  ...  We probe each domain for web existence (by sending an HTTP request) and for HTTPS existence (by sending an HTTPS request). If a domain responds to HTTPS, we harvest its SSL certificate.  ... 
arXiv:0909.3688v4 fatcat:3cj37ibtsffgfjbk7ugn3wuxqy

IKHarvester - Informal eLearning with Semantic Web Harvesting

Jacek Jankowski, Adam Westerski, Sebastian Ryszard Kruk, Tadhg Nagle, Jaroslaw Dobrzanski
2008 2008 IEEE International Conference on Semantic Computing  
Finally, semantic web harvesting technology as a solution is explored in the form of the knowledge acquisition tool called IKHarvester.  ...  IKHarvester 5 is a web service that provides two core features: harvesting Social Semantic Information Sources, and providing it for the eLearning frameworks.  ...  Figure 1 . 1 Process of harvesting data from informal knowledge repositories.  ... 
doi:10.1109/icsc.2008.47 dblp:conf/semco/JankowskiWKND08 fatcat:4pdgl6b4ezcfdodgxi2hkgahpy

Towards complete coverage in focused web harvesting

Mohammadreza Khelghati, Djoerd Hiemstra, Maurice van Keulen
2015 Proceedings of the 17th International Conference on Information Integration and Web-based Applications &Services - iiWAS '15  
With the goal of harvesting all information about a given entity, in this paper, we try to harvest all matching documents for a given query submitted on a search engine.  ...  These limitations are also applied in deep web sources, for instance in social networks like Twitter.  ...  LITERATURE STUDY Deep Web harvesting In this work, we are interested in the methods that are applied to access deep web data either for sampling or harvesting websites.  ... 
doi:10.1145/2837185.2837208 dblp:conf/iiwas/KhelghatiHK15 fatcat:35omkmlxzzazfm7ksqmnukrhei

A Simple Mechanism for Focused Web-harvesting [article]

Z. Akbar, L.T. Handoko
2008 arXiv   pre-print
The web-harvesting has been implemented and extended by not only specifying the targeted URLs, but also predefining human-edited harvesting parameters to improve the speed and accuracy.  ...  The focused web-harvesting is deployed to realize an automated and comprehensive index databases as an alternative way for virtual topical data integration.  ...  The architecture is inspired and combination of focused web-crawling and web-harvesting.  ... 
arXiv:0809.0723v1 fatcat:3vj3m36tyrb5ljtieiqohp2ldi
« Previous Showing results 1 — 15 out of 159,761 results