Designing focused crawler based on improved genetic algorithm

Wei Yan, Li Pan
2018 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)  
In today's world web has gained popularity due to its own as well as internet development due to which there is a much more need of the method by which we can increase the efficiency of locating the deep-web interface. There is a method which surfs the World Wide Web in automatic way known as a web crawler. Deep web databases are regularly inadequately distributed, and keep consistently changing. To solve this problem, work done beforehand gives two sorts of crawler: generic crawlers and
more » ... crawlers. Focused crawling has drawn a lot of attention from researchers in the past decade. Focused crawler searches the specific term or topic on internet. Vertical search is done very presizely and good searching strategies helps to improve the accuracy so Best-First search strategy is utilized but it falls into local optimization. So for improving global search we presented focused crawler with improved genetic algorithm also called as global search algorithm. Here, fitness function concede topic correlationand topic importance. Topic correlation is analyzed by vector spacemodel and topic importance is estimated by improved PageRankalgorithm. Genetic operations are optimized based on browsing behavior of user. Selection operation chooses webpages withgreater fitness, crossover operation sorts links by topic importanceand mutation operation searches combined keywords withsearch engine. Compared with previous genetic algorithms, theexperimental results show that improved genetic algorithm canincrease precision and recall of focused crawler and enlarge the search scope of the crawler. Conducted evaluation experiments to examine the effectiveness of our approach.
doi:10.1109/icaci.2018.8377476 dblp:conf/icaci/YanP18 fatcat:pheqlkg72veflouju74okm2664