A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Application of Distributed Web Crawlers in Information Management System
2018
Informatica (Ljubljana, Tiskana izd.)
systems. ...
The simulation experiment verified that the system could operate stably in information management system, which offers a reference for the application of distributed web crawlers in information management ...
Conclusion In conclusion, distributed network crawlers based information management system could precisely satisfy the requirements of web crawling, with a high performance and expandability. ...
dblp:journals/informaticaSI/Wen18
fatcat:gyrizkk3gngunmg3f2z7ek43sa
Speeding Up the Web Crawling Process on a Multi-Core Processor Using Virtualization
2013
International Journal on Web Service Computing
So that the crawling process should be a continuous process performed from time-to-time to maintain up-to-date crawled data. ...
This paper develops and investigates the performance of a new approach to speed up the crawling process on a multi-core processor through virtualization. ...
cost-effective high speed crawling system. ...
doi:10.5121/ijwsc.2013.4102
fatcat:h3i4nps4cfgjzm37tzqbdnzxui
news-please
2017
International Symposium of Information Science
Our system allows crawling arbitrary news websites and extracting the major elements of news articles on those websites, i.e., title, lead paragraph, main content, publication date, author, and main image ...
However, large scale collection of news data is cumbersome due to a lack of generic tools for crawling and extracting such data. ...
These systems typically achieve high precision and recall for their extraction task, but require significant initial setup effort in order to customize the extractors to a set of specific news websites ...
doi:10.18452/1447
dblp:conf/isiwi/HamborgMBG17
fatcat:763h7ckq6rf2hlyqp6t46s4pku
news-please: A Generic News Crawler and Extractor
2017
Zenodo
Our system allows crawling arbitrary news websites and extracting the major elements of news articles on those websites, i.e., title, lead paragraph, main content, publication date, author, and main image ...
However, large scale collection of news data is cumbersome due to alack of generic tools for crawling and extracting such data. ...
Web Crawling. news-please performs two sub-tasks in this phase. (1) The crawler downloads articles' HTML, using the scrapy framework. (2) To find all articles published by the news outlet, the system supports ...
doi:10.5281/zenodo.4120316
fatcat:ubvtewe25zgy5c47kfe3pjkgim
An Extended Model for Effective Migrating Parallel Web Crawling with Domain Specific and Incremental Crawling
2012
International Journal on Web Service Computing
In this paper we propose the architecture for Effective Migrating Parallel Web Crawling approach with domain specific and incremental crawling strategy that makes web crawling system more effective and ...
Domain specific crawling will yield high quality pages. The crawling process will migrate to host or server with specific domain and start downloading pages within specific domain. ...
High quality of pages will be downloaded as crawling processes are performing in breadth first manner. Breadth first crawling improves the quality of downloaded pages. ...
doi:10.5121/ijwsc.2012.3308
fatcat:5p43j3aevnfttk6et4kdu6lzxu
Current Challenges in Web Crawling
[chapter]
2013
Lecture Notes in Computer Science
In this tutorial, we will introduce the audience to five topics: architecture and implementation of high-performance web crawler, collaborative web crawling, crawling the deep Web, crawling multimedia ...
Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website ...
-Architecture and implementation of high-performance web crawler. ...
doi:10.1007/978-3-642-39200-9_49
fatcat:igaskwpugrdvpapxbwxg5imyge
High-Performance Web Crawling
[chapter]
2002
Massive Computing
Abstract High-performance web crawlers are an important component of many web services. ...
This chapter describes our experience building and operating such a high-performance crawler. ...
High performance. ...
doi:10.1007/978-1-4615-0005-6_2
fatcat:axqtctlvfzfdhpywjmqmi7taye
Capturing Connectivity Graphs of a Large-Scale P2P Overlay Network
2013
2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops
The results show that the crawler is fast and captures high accurate graph snapshots. ...
Measuring accurate graph snapshots of peer-to-peer (P2P) overlay networks is essential to understand these systems. ...
They also thank Moritz Steiner for his cooperation through performing tests on Blizzard, and thank the G-lab administrators for their help. ...
doi:10.1109/icdcsw.2013.35
dblp:conf/icdcsw/SalahS13
fatcat:me56h6y2c5dx5apfhi75yaqxj4
Unsupervised Parallel Corpus Mining on Web Data
[article]
2020
arXiv
pre-print
With a large amount of parallel data, neural machine translation systems are able to deliver human-level performance for sentence-level translation. ...
On the WMT'16 English-Romanian and Romanian-English benchmarks, our system produces new state-of-the-art results, 39.81 and 38.95 BLEU scores, even compared with supervised approaches. ...
In our experiment, we show that the machine translation system trained with crawled parallel data from our system is able to achieve a similar or even superior performance compared to fully supervised ...
arXiv:2009.08595v1
fatcat:jwfgwptdkzfipbmr6vdhl23zjy
An improved topic relevance algorithm for focused crawling
2011
2011 IEEE International Conference on Systems, Man, and Cybernetics
Third, in real crawling experiments on the prototype system, the crawler using TF-IDF has high performance with the accumulated topic relevance increasing quickly at the beginning of crawling, however ...
Last, the crawler using TFIDF+LSI performs the same crawl task and demonstrates the combination advantage of TF-IDF and LSI. ...
accumulated topic relevance increases steadily and slowly throughout the whole crawling. • Secondly, at the beginning of crawling, the crawler using TF-IDF has high performance with the accumulated topic ...
doi:10.1109/icsmc.2011.6083759
dblp:conf/smc/HaoMYLW11
fatcat:qex6mjeitvcghltaujii3biycq
Using web text to improve keyword spotting in speech
2013
2013 IEEE Workshop on Automatic Speech Recognition and Understanding
In this paper, we investigate the use of online text resources to improve the performance of speech recognition specifically for the task of keyword spotting. ...
By integrating the web text into our systems, we observed significant improvements in keyword spotting accuracy for four out of the five languages. ...
The most gain was obtained in Turkish, where the LimitedLP system has a rather high OOV rate. ...
doi:10.1109/asru.2013.6707768
dblp:conf/asru/GandheQMRLE13
fatcat:e23pdljlzzdkddheqllq3owp5u
SNES: Social-Network-Oriented Public Opinion Monitoring Platform Based on ElasticSearch
2019
Computers Materials & Continua
However, these platforms cannot perform well in scalability, fault tolerance, and real-time performance. ...
A great number of empirical experiments prove that the platform can adapt well to the social network with highly real-time data and has good performance in public opinion monitoring. ...
Kafka (a high throughput distributed publish and subscribe message system) and Spark Streaming (a real-time streaming computing framework). ...
doi:10.32604/cmc.2019.06133
fatcat:pczrzpwrkzen7nz4bblcorebhq
Experimental Study of Military Crawl as a Special Type of Human Quadripedal Automatic Locomotion
2021
Applied Sciences
Progressive and propulsive motions are characterized as normal; additional right–left side motions—with high degree of reciprocity. ...
Eight healthy adults aged 15–31 (four women and four men) were examined by means of a 3D kinematic analysis with Optitrack optical motion-capture system which consists of 12 Flex 13 cameras. ...
Military Crawling A biomechanical analysis of motions used in military crawling was performed. ...
doi:10.3390/app11167666
fatcat:hwnebynfyvfdlppsl74cgtxwqm
A Parametric Layered Approach to Perform Web Page Ranking
2013
International Journal of Computer Applications
The presented work will provide an recommendation based web page indexing so that effective web crawling will be performed. ...
Web crawling is the foremost step to perform the effective and efficient web content search so that the user will get the specific web pages initially in an indexed form. ...
Author presented an architecture for the system with the performance bottleneck and to drive the high performance based association search over the web [7] .The author has defined the work under the capabilities ...
doi:10.5120/11467-7251
fatcat:p4wl56r6vrcydk3qa4pezsb3da
Ontology Property-based Adaptive Crawler for Linked Data(OPAC)
2013
2013 Fourth International Conference on the Network of the Future (NoF)
Performance evaluation shows that this system can reduce overhead costs by more than 70% while maintaining a high freshness of data. ...
Frequent crawling is required for dynamic data to meet the high freshness requirement of real time applications. Crawling large datasets may cause serious scalability problems. ...
Performance evaluation shows that most of the data can maintain high freshness with much lower overhead. The paper is organized as follows. ...
doi:10.1109/nof.2013.6724500
dblp:conf/nof/AnKLL13
fatcat:g6zjc2dukbb6rhpsnk4yzkc3ui
« Previous
Showing results 1 — 15 out of 76,052 results