The Method of Keyword Based Crawler Load Balancing

MO-JI WEI, YAN-QING ZHAO, SHI-WEI ZHU, AI-QIN YANG
2018 DEStech Transactions on Computer Science and Engineering  
This paper researches feature of different data sources such as web site and social media, and proposes a load balancing method for distributed web crawlers by calculating weights of crawling data from various sources. Firstly, a seeding links allocation strategy is proposed based on analyzing differences of statistical data update frequency of different data sources. Then with the allocation strategy a data crawling solution using domain names and keywords as its task unit is given. Finally,
more » ... adjusting allocation among distributed web crawlers with calculating time expenditures of domain names and keywords as its weights, a load balancing method is proposed.
doi:10.12783/dtcse/ceic2018/24546 fatcat:72flpg3ob5cvbk6qo7qr5jorru