A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
The Method of Keyword Based Crawler Load Balancing
DEStech Transactions on Computer Science and Engineering
This paper researches feature of different data sources such as web site and social media, and proposes a load balancing method for distributed web crawlers by calculating weights of crawling data from various sources. Firstly, a seeding links allocation strategy is proposed based on analyzing differences of statistical data update frequency of different data sources. Then with the allocation strategy a data crawling solution using domain names and keywords as its task unit is given. Finally,doi:10.12783/dtcse/ceic2018/24546 fatcat:72flpg3ob5cvbk6qo7qr5jorru