Filters








209 Hits in 6.4 sec

Site Design Impact on Robots

Joan A. Smith, Michael L. Nelson
2008 D-Lib Magazine  
Does the design of the site deep or wide affect robot behavior? Are there strategies that increase crawler penetration?  ...  Figure 2 Examples of Wide Design and Deep Design One of our goals was to test crawler willingness to explore the width and depth of a site. What do these terms mean?  ... 
doi:10.1045/march2008-smith fatcat:enxc2zujprcubbpwj5fe4mhnqm

Observed Web Robot Behavior on Decaying Web Subsites

Joan A. Smith, Frank McCown, Michael L. Nelson
2006 D-Lib Magazine  
However, some of the crawling behaviors themselves proved to be interesting and have implications on using a search engine as an interface to a digital library.  ...  We describe the observed crawling patterns of various search engines (including Google, Yahoo and MSN) as they traverse a series of web subsites whose contents decay at predetermined rates.  ...  It is possible that this kind of site structure may help host sites to reduce the processing-impact of robot crawls on host servers by Google, Inktomi, and MSN. Bibliography  ... 
doi:10.1045/february2006-smith fatcat:22ryl3t4ara63fk4zaa25plgye

Practice of Robots Exclusion Protocol in Bhutan

2020 Journal of Education and Practice  
Most of the search engines rely on the web robots to collect information from the web.  ...  But not all wish to get their websites and web pages indexed by web crawlers.  ...  web crawlers are not able to crawl and fetch the deep web pages and Ajax pages available in the websites with its current Robots Protocol designed and developed by many search engine companies.  ... 
doi:10.7176/jep/11-35-01 fatcat:gx7tdxfgkvdn5ormllrxatr7zy

Avoiding Vulnerabilities and Attacks with a Proactive Strategy for Web Applications

Shahzad Ashraf
2021 Advances in Robotics & Mechanical Engineering  
To prevent the web addresses mining challenges a script was designed to mines irrelevant web address URL by visiting multiple search engines based on user input.  ...  In the 2nd step, another script was designed to check those domains having chances of becoming inactive because for security reasons such as onion sites.  ...  Facing the web mining challenges, a script was designed to mines irrelevant web address URL by visiting multiple search engines based on user input.  ... 
doi:10.32474/arme.2021.03.000157 fatcat:qsp5pucl3behtbsukhmxn4njwm

A novel defense mechanism against web crawlers intrusion

Alireza Aghamohammadi, Ali Eydgahi
2013 2013 International Conference on Electronics, Computer and Computation (ICECCO)  
However, there were other researchers who examined the issue of ethical behavior related to web crawlers and autonomous software robots at a deeper level.  ...  such as search engine robots and crawlers to access web pages.  ... 
doi:10.1109/icecco.2013.6718280 fatcat:24i4yfnv65axbi6cvlb2dq3xue

Design and Implementation of Website Information Disclosure Assessment System

Ying-Chiang Cho, Jen-Yi Pan, Francesco Pappalardo
2015 PLoS ONE  
This testing showed the importance of increasing the security and privacy of website information for academic websites.  ...  This system utilizes a series of technologies, such as web crawler algorithms, SQL injection attack detection, and web vulnerability mining, to assess a website's information disclosure.  ...  The static mining module makes a deep detection on a single site, such as e-mail leakage, the presence of robots. txt files, an SQL injection, file downloading URLs, or broken links.  ... 
doi:10.1371/journal.pone.0117180 pmid:25768434 pmcid:PMC4358938 fatcat:oxrrxpqfmzddllasjzx46gta2i

Factors affecting website reconstruction from the web infrastructure

Frank McCown, Norou Diawara, Michael L. Nelson
2007 Proceedings of the 2007 conference on Digital libraries - JCDL '07  
We examine several characteristics of the websites over time including birth rate, decay and age of resources.  ...  In this paper we describe an experiment where we crawled and reconstructed 300 randomly selected websites on a weekly basis for 14 weeks.  ...  The ODP indexes a wide variety of websites in over 40 languages, and all search engines have an equal chance of indexing it.  ... 
doi:10.1145/1255175.1255182 dblp:conf/jcdl/McCownDN07 fatcat:kh7uolfbm5cydijsfbjexi2kga

Cloak of Visibility: Detecting When Machines Browse a Different Web

Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, Elie Bursztein
2016 2016 IEEE Symposium on Security and Privacy (SP)  
This includes a first look at multiple IP blacklists that contain over 50 million addresses tied to the top five search engines and tens of anti-virus and security crawlers.  ...  We apply our system to an unlabeled set of 135,577 search and advertisement URLs keyed on high-risk terms (e.g., luxury products, weight loss supplements) to characterize the prevalence of threats in the  ...  The list, updated twice daily, contained 54,166 unique IP addresses tied to popular search engines and crawlers at the time of our analysis.  ... 
doi:10.1109/sp.2016.50 dblp:conf/sp/InvernizziTKCPB16 fatcat:koc4b2yuvzauvak67dopisnkw4

Using the web infrastructure to preserve web pages

Michael L. Nelson, Frank McCown, Joan A. Smith, Martin Klein
2007 International Journal on Digital Libraries  
We provide an overview of our ongoing research projects that focus on using the "web infrastructure" to provide preservation capabilities for web pages and examine the overlap these approaches have with  ...  To date, most of the focus regarding digital preservation has been on replicating copies of the resources to be preserved from the "living web" and placing them in an archive for controlled curation.  ...  We also thank the anonymous reviewers who made many suggestions to improve the clarity of the text and suggested the overlap of these digital preservation approaches with techniques from information retrieval  ... 
doi:10.1007/s00799-007-0012-y fatcat:5ufnwywctfbrrheo65zxqrz22q

A Multimodal Analytics Platform for Journalists Analyzing Large-Scale, Heterogeneous Multilingual, and Multimedia Content

Stefanos Vrochidis, Anastasia Moumtzidou, Ilias Gialampoukidis, Dimitris Liparas, Gerard Casamayor, Leo Wanner, Nicolaus Heise, Tilman Wagner, Andriy Bilous, Emmanuel Jamin, Boyan Simeonov, Vladimir Alexiev (+3 others)
2018 Frontiers in Robotics and AI  
The textual and multimedia content is semantically integrated and indexed using a common representation, to be accessible through a web-based search engine.  ...  Textual information is automatically summarized and can be translated (on demand) into the language of the journalist.  ...  search engine.  ... 
doi:10.3389/frobt.2018.00123 pmid:33501002 pmcid:PMC7805659 fatcat:lw73va4vrbaq5ir5ztc5caujnu

Intelligent Search Optimization using Artificial Fuzzy Logics [article]

Jai Manral
2015 arXiv   pre-print
Search engines index and categorize web pages according to their contents using crawlers and rank them accordingly.  ...  Search Engine Optimization (SEO) is that technique by which webmasters try to improve ranking of their websites by optimizing it according to search engines ranking parameters.  ...  In related experiment conducted by (Brin and Page, 1998) it took 9 days for indexing 26 million pages at an average 48.5 pages/second and using 3 crawlers at one time.  ... 
arXiv:1510.00819v1 fatcat:akae3fodz5au7bmvvk44hmduky

Bot recognition in a Web store: An approach based on unsupervised learning

Stefano Rovetta, Grażyna Suchacka, Francesco Masulli
2020 Journal of Network and Computer Applications  
A B S T R A C T Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance.  ...  Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a  ...  Acknowledgment This paper is based upon work from COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications (cHiPSet), supported by COST (European Cooperation in Science and  ... 
doi:10.1016/j.jnca.2020.102577 fatcat:lta44xzor5gpnnjqoxac7ka5tm

An Analysis of Chinese Search Engine Filtering [article]

Tao Zhu, Christopher Bronk, Dan S. Wallach
2011 arXiv   pre-print
between the Great Firewall and Chinese search engines.  ...  The imposition of government mandates upon Internet search engine operation is a growing area of interest for both computer science and public policy.  ...  We have chosen to examine not a standalone word. how that will may be impacting search engine operations.  ... 
arXiv:1107.3794v1 fatcat:bvsu2dygpjagni2onc4nwgf5q4

The Search as Learning Spaceship: Toward a Comprehensive Model of Psychological and Technological Facets of Search as Learning

Johannes von Hoyer, Anett Hoppe, Yvonne Kammerer, Christian Otto, Georg Pardi, Markus Rokicki, Ran Yu, Stefan Dietze, Ralph Ewerth, Peter Holtz
2022 Frontiers in Psychology  
Using a Web search engine is one of today's most frequent activities.  ...  At first, we provide an overview of the current state of the art with regard to the five main entities of our model, before we outline areas of future research to improve our understanding of search as  ...  AUTHOR CONTRIBUTIONS JH, PH, AH, YK, CO, GP, MR, and RY wrote sections of the manuscript. JH, PH, RE, and SD contributed to conception and design of the manuscript.  ... 
doi:10.3389/fpsyg.2022.827748 pmid:35369228 pmcid:PMC8964633 fatcat:i5korimbqfaidhsnuz3p5xhzcm

PathMarker: protecting web contents against inside crawlers

Shengye Wan, Yue Li, Kun Sun
2019 Cybersecurity  
We deploy our approach on an online forum website, and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers, plus two external crawlers (i.e., Googlebots  ...  In addition to effectively detecting crawlers at the earliest stage, PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.  ...  Office of Naval Research under grants N00014-16-1-3214 and N00014-16-1-3216.  ... 
doi:10.1186/s42400-019-0023-1 fatcat:ckt3ah2pwjg4bhqunft3aaz5um
« Previous Showing results 1 — 15 out of 209 results