Filters








1,232 Hits in 6.9 sec

Guide focused crawler efficiently and effectively using on-line topical importance estimation

Ziyu Guan, Can Wang, Chun Chen, Jiajun Bu, Junfeng Wang
2008 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08  
We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to  ...  Focused crawling is a critical technique for topical resource discovery on the Web.  ...  We propose a new frontier prioritizing algorithm inspired from the On-line Page Importance Computation (OPIC) algorithm [1] , namely On-line Topical Importance Estimation (OTIE).  ... 
doi:10.1145/1390334.1390488 dblp:conf/sigir/GuanWCBW08 fatcat:kmdn3q6iojholguw4vac2zxo2i

On-line topical importance estimation: an effective focused crawling algorithm combining link and content analysis

Can Wang, Zi-yu Guan, Chun Chen, Jia-jun Bu, Jun-feng Wang, Huai-zhong Lin
2009 Journal of Zhejiang University: Science A  
Traditional focused crawlers mainly rely on content analysis. Link-based techniques are not effectively exploited despite their usefulness.  ...  In this paper, we propose a new frontier prioritizing algorithm, namely the on-line topical importance estimation (OTIE) algorithm.  ...  Because our aim is not to estimate topical PageRank scores of pages, but to effectively guide focused crawlers, this is not a big problem.  ... 
doi:10.1631/jzus.a0820481 fatcat:gpbva4slrjb4rksndd442xmok4

Status Locality on the Web: Implications for Building Focused Collections

Gautam Pant, Padmini Srinivasan
2013 Information systems research  
Analogous to topical locality, status locality may also be exploited by web crawlers. Collections built by such crawlers include pages that are both topically relevant and also important.  ...  In contrast, topical web crawlers depend on local information based on previously downloaded pages.  ...  Estimating Topicality The crawler estimates topicality with an SVM-based text classifier that has been used effectively to guide topical crawlers (Pant and Srinivasan 2005) .  ... 
doi:10.1287/isre.1120.0457 fatcat:o3xmqkuvi5beflfyjuxiufwuqe

iCrawl

Gerhard Gossen, Elena Demidova, Thomas Risse
2015 Proceedings of the 15th ACM/IEEE-CE on Joint Conference on Digital Libraries - JCDL '15  
Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers.  ...  The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.  ...  Acknowledgements This work was partially funded by the ERC under ALEXAN-DRIA (ERC 339233) and BMBF (Project "GlycoRec").  ... 
doi:10.1145/2756406.2756925 dblp:conf/jcdl/GossenDR15 fatcat:t6fsh5cvnzb6flpi3ked2hhwf4

An Improved Technique for Web Page Classification in Respect of Domain Specific Search

Vivek Chandra, Nidhi Saxena
2014 International Journal of Computer Applications  
A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines.  ...  The results show that Hi-SVM is a better choice for guiding a topical crawler when compared to Support Vector Machine and Neural Network.  ...  Retrieving of relevant information from the web, efficiently and effectively has therefore become a challenge.  ... 
doi:10.5120/17801-8615 fatcat:kraqmhkigfeuzlll3hfgxfoffi

Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries

2015 International Journal of Science and Research (IJSR)  
The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents.  ...  We deal with this problem by using the contents of pages to focus the crawl on a topic; by prioritizing promising links within the topic; and by also following links that may not lead to immediate benefit  ...  Limitation of FFC An experimental evaluation of the FFC [3] showed that FFC is more efficient and retrieves up to an order of magnitude more searchable forms than a crawler that focuses only on topic  ... 
doi:10.21275/v4i12.nov152532 fatcat:bvyuvcuhxndjhhyycp3aljsrue

An adaptive crawler for locating hiddenwebentry points

Luciano Barbosa, Juliana Freire
2007 Proceedings of the 16th international conference on World Wide Web - WWW '07  
We deal with this problem by using the contents of pages to focus the crawl on a topic; by prioritizing promising links within the topic; and by also following links that may not lead to immediate benefit  ...  forms as crawlers that use a fixed focus strategy.  ...  This work is partially supported by the National Science Foundation (under grants IIS-0513692, CNS-0524096, IIS-0534628) and a University of Utah Seed Grant.  ... 
doi:10.1145/1242572.1242632 dblp:conf/www/BarbosaF07a fatcat:jjwhm5ppojbnbfetxuuvmqae6m

Focused Crawling Using Temporal Difference-Learning [chapter]

Alexandros Grigoriadis, Georgios Paliouras
2004 Lecture Notes in Computer Science  
This paper deals with the problem of constructing an intelligent Focused Crawler, i.e. a system that is able to retrieve documents of a specific topic from the Web.  ...  The crawler must contain a component which assigns visiting priorities to the links, by estimating the probability of leading to a relevant page in the future.  ...  Some of these methods take advantage of the "Topical Locality" of the Web (the property of pages with similar topic being connected with hyperlinks [2] ) and use it to guide the focused crawler [3] .  ... 
doi:10.1007/978-3-540-24674-9_16 fatcat:i7buxh7e3jc73kkuopawoeenvi

Focused crawling: a new approach to topic-specific Web resource discovery

Soumen Chakrabarti, Martin van den Berg, Byron Dom
1999 Computer Networks  
Our anecdotes suggest that focused crawling is very effective for building high-quality collections of Web documents on specific topics, using modest desktop hardware.  ...  We report on extensive focused-crawling experiments using several topics at different levels of specificity.  ...  interface; and Sunita Sarawagi, Amit Somani and Kiran Mehta for advice with disk data structures.  ... 
doi:10.1016/s1389-1286(99)00052-3 fatcat:hyu7mthfjrd2dowwhx3qxtdhoa

Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning

Mukesh Kumar, Renu Vig
2013 Journal of Emerging Technologies in Web Intelligence  
A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific  ...  Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine's users complete and fresher most information  ...  It makes difficult to discover topic relevant information that can be used in specialized portals and on-line search. To tackle this issue the focused web crawlers are emerging.  ... 
doi:10.4304/jetwi.5.1.70-77 fatcat:fiucrzwbc5fsdgt7v5eqky7k5i

Improving the performance of focused web crawlers

Sotiris Batsakis, Euripides G.M. Petrakis, Evangelos Milios
2009 Data & Knowledge Engineering  
Several variants of state-of-the-art crawlers relying on web page content and link information for estimating the relevance of web pages to a given topic are proposed.  ...  Furthermore, the new HMM crawler improved the performance of the original HMM crawler and also outperforms classic focused crawlers in searching for specialized topics.  ...  Applications of focused crawlers also include guiding intelligent agents on the Web for locating specialized information.  ... 
doi:10.1016/j.datak.2009.04.002 fatcat:fwfnsada6fbildpsxhug42umoi

Topical web crawlers

Filippo Menczer, Gautam Pant, Padmini Srinivasan
2004 ACM Transactions on Internet Technology  
In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search.  ...  The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages.  ...  for constructive criticism and suggestions, and to Alberto Segre and Dave Eichmann for computing support.  ... 
doi:10.1145/1031114.1031117 fatcat:lyczl4qopfbi3mjg7budmpbqay

Profile-Based Focused Crawling for Social Media-Sharing Websites

Zhiyong Zhang, Olfa Nasraoui
2009 EURASIP Journal on Image and Video Processing  
In order to efficiently and effectively extract data for the focused crawling, a path string-based page classification method is first developed for identifying list pages, detail pages, and profile pages  ...  Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC)  ...  In the second stage, which is the actual crawling stage, we use the information acquired from the first stage to guide our focused crawler.  ... 
doi:10.1155/2009/856037 fatcat:r3667lc5nzhzlbpfneqwehfbfa

A focused crawler combinatory link and content model based on T-Graph principles

Ali Seyfi, Ahmed Patel
2016 Computer Standards & Interfaces  
The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download.  ...  link, based on which the topical focus of an unvisited page can be predicted and elicited with a high accuracy.  ...  A focused crawler, on the other hand, explores the documents about a specific (set of) topic(s) and guides the searching process based on both the content and link structure of the Web.  ... 
doi:10.1016/j.csi.2015.07.001 fatcat:plfwvmmtnfaktowcqtslnv4l4e

Design and implementation of contextual information portals

Jay Chen, Russell Power, Lakshminarayanan Subramanian, Jonathan Ledlie
2011 Proceedings of the 20th international conference companion on World wide web - WWW '11  
We combine an efficient classifier with a focused crawler to gather the web pages for the portal for any given topic.  ...  Using several secondary school course syllabi, we demonstrate the effectiveness of our system for constructing CIPs for use as an education resource.  ...  ACKNOWLEDGEMENTS The authors would like to thank Trishank Karthik for his help with the early implementation of the focused crawler and Mangala Kanthamani for her help in our preliminary study.  ... 
doi:10.1145/1963192.1963359 dblp:conf/www/ChenPSL11 fatcat:7gr4ez5cnzh4neamt6ppmejjru
« Previous Showing results 1 — 15 out of 1,232 results