A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Guide focused crawler efficiently and effectively using on-line topical importance estimation
2008
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08
We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to ...
Focused crawling is a critical technique for topical resource discovery on the Web. ...
We propose a new frontier prioritizing algorithm inspired from the On-line Page Importance Computation (OPIC) algorithm [1] , namely On-line Topical Importance Estimation (OTIE). ...
doi:10.1145/1390334.1390488
dblp:conf/sigir/GuanWCBW08
fatcat:kmdn3q6iojholguw4vac2zxo2i
On-line topical importance estimation: an effective focused crawling algorithm combining link and content analysis
2009
Journal of Zhejiang University: Science A
Traditional focused crawlers mainly rely on content analysis. Link-based techniques are not effectively exploited despite their usefulness. ...
In this paper, we propose a new frontier prioritizing algorithm, namely the on-line topical importance estimation (OTIE) algorithm. ...
Because our aim is not to estimate topical PageRank scores of pages, but to effectively guide focused crawlers, this is not a big problem. ...
doi:10.1631/jzus.a0820481
fatcat:gpbva4slrjb4rksndd442xmok4
Status Locality on the Web: Implications for Building Focused Collections
2013
Information systems research
Analogous to topical locality, status locality may also be exploited by web crawlers. Collections built by such crawlers include pages that are both topically relevant and also important. ...
In contrast, topical web crawlers depend on local information based on previously downloaded pages. ...
Estimating Topicality The crawler estimates topicality with an SVM-based text classifier that has been used effectively to guide topical crawlers (Pant and Srinivasan 2005) . ...
doi:10.1287/isre.1120.0457
fatcat:o3xmqkuvi5beflfyjuxiufwuqe
Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. ...
The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler. ...
Acknowledgements This work was partially funded by the ERC under ALEXAN-DRIA (ERC 339233) and BMBF (Project "GlycoRec"). ...
doi:10.1145/2756406.2756925
dblp:conf/jcdl/GossenDR15
fatcat:t6fsh5cvnzb6flpi3ked2hhwf4
An Improved Technique for Web Page Classification in Respect of Domain Specific Search
2014
International Journal of Computer Applications
A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines. ...
The results show that Hi-SVM is a better choice for guiding a topical crawler when compared to Support Vector Machine and Neural Network. ...
Retrieving of relevant information from the web, efficiently and effectively has therefore become a challenge. ...
doi:10.5120/17801-8615
fatcat:kraqmhkigfeuzlll3hfgxfoffi
Focused and Adaptive Crawling for Topic Specific and Hidden Web Entries
2015
International Journal of Science and Research (IJSR)
The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. ...
We deal with this problem by using the contents of pages to focus the crawl on a topic; by prioritizing promising links within the topic; and by also following links that may not lead to immediate benefit ...
Limitation of FFC An experimental evaluation of the FFC [3] showed that FFC is more efficient and retrieves up to an order of magnitude more searchable forms than a crawler that focuses only on topic ...
doi:10.21275/v4i12.nov152532
fatcat:bvyuvcuhxndjhhyycp3aljsrue
An adaptive crawler for locating hiddenwebentry points
2007
Proceedings of the 16th international conference on World Wide Web - WWW '07
We deal with this problem by using the contents of pages to focus the crawl on a topic; by prioritizing promising links within the topic; and by also following links that may not lead to immediate benefit ...
forms as crawlers that use a fixed focus strategy. ...
This work is partially supported by the National Science Foundation (under grants IIS-0513692, CNS-0524096, IIS-0534628) and a University of Utah Seed Grant. ...
doi:10.1145/1242572.1242632
dblp:conf/www/BarbosaF07a
fatcat:jjwhm5ppojbnbfetxuuvmqae6m
Focused Crawling Using Temporal Difference-Learning
[chapter]
2004
Lecture Notes in Computer Science
This paper deals with the problem of constructing an intelligent Focused Crawler, i.e. a system that is able to retrieve documents of a specific topic from the Web. ...
The crawler must contain a component which assigns visiting priorities to the links, by estimating the probability of leading to a relevant page in the future. ...
Some of these methods take advantage of the "Topical Locality" of the Web (the property of pages with similar topic being connected with hyperlinks [2] ) and use it to guide the focused crawler [3] . ...
doi:10.1007/978-3-540-24674-9_16
fatcat:i7buxh7e3jc73kkuopawoeenvi
Focused crawling: a new approach to topic-specific Web resource discovery
1999
Computer Networks
Our anecdotes suggest that focused crawling is very effective for building high-quality collections of Web documents on specific topics, using modest desktop hardware. ...
We report on extensive focused-crawling experiments using several topics at different levels of specificity. ...
interface; and Sunita Sarawagi, Amit Somani and Kiran Mehta for advice with disk data structures. ...
doi:10.1016/s1389-1286(99)00052-3
fatcat:hyu7mthfjrd2dowwhx3qxtdhoa
Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning
2013
Journal of Emerging Technologies in Web Intelligence
A focused crawler traverses the Web to collect documents related to a particular topic, and can be used to build topic specific collection of documents for use in digital libraries and domain specific ...
Focused crawler help the search indexer to index all documents present on the World Wide Web related to a specific domain which in turn provides search engine's users complete and fresher most information ...
It makes difficult to discover topic relevant information that can be used in specialized portals and on-line search. To tackle this issue the focused web crawlers are emerging. ...
doi:10.4304/jetwi.5.1.70-77
fatcat:fiucrzwbc5fsdgt7v5eqky7k5i
Improving the performance of focused web crawlers
2009
Data & Knowledge Engineering
Several variants of state-of-the-art crawlers relying on web page content and link information for estimating the relevance of web pages to a given topic are proposed. ...
Furthermore, the new HMM crawler improved the performance of the original HMM crawler and also outperforms classic focused crawlers in searching for specialized topics. ...
Applications of focused crawlers also include guiding intelligent agents on the Web for locating specialized information. ...
doi:10.1016/j.datak.2009.04.002
fatcat:fwfnsada6fbildpsxhug42umoi
Topical web crawlers
2004
ACM Transactions on Internet Technology
In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search. ...
The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. ...
for constructive criticism and suggestions, and to Alberto Segre and Dave Eichmann for computing support. ...
doi:10.1145/1031114.1031117
fatcat:lyczl4qopfbi3mjg7budmpbqay
Profile-Based Focused Crawling for Social Media-Sharing Websites
2009
EURASIP Journal on Image and Video Processing
In order to efficiently and effectively extract data for the focused crawling, a path string-based page classification method is first developed for identifying list pages, detail pages, and profile pages ...
Our experiments prove the robustness of our profile-based focused crawler, as well as a significant improvement in harvest ratio, compared to breadth-first and online page importance computation (OPIC) ...
In the second stage, which is the actual crawling stage, we use the information acquired from the first stage to guide our focused crawler. ...
doi:10.1155/2009/856037
fatcat:r3667lc5nzhzlbpfneqwehfbfa
A focused crawler combinatory link and content model based on T-Graph principles
2016
Computer Standards & Interfaces
The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download. ...
link, based on which the topical focus of an unvisited page can be predicted and elicited with a high accuracy. ...
A focused crawler, on the other hand, explores the documents about a specific (set of) topic(s) and guides the searching process based on both the content and link structure of the Web. ...
doi:10.1016/j.csi.2015.07.001
fatcat:plfwvmmtnfaktowcqtslnv4l4e
Design and implementation of contextual information portals
2011
Proceedings of the 20th international conference companion on World wide web - WWW '11
We combine an efficient classifier with a focused crawler to gather the web pages for the portal for any given topic. ...
Using several secondary school course syllabi, we demonstrate the effectiveness of our system for constructing CIPs for use as an education resource. ...
ACKNOWLEDGEMENTS The authors would like to thank Trishank Karthik for his help with the early implementation of the focused crawler and Mangala Kanthamani for her help in our preliminary study. ...
doi:10.1145/1963192.1963359
dblp:conf/www/ChenPSL11
fatcat:7gr4ez5cnzh4neamt6ppmejjru
« Previous
Showing results 1 — 15 out of 1,232 results