A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is
International Conference on Internet Surveillance and Protection (ICISP'06)
In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access-log sessions as being crawler-or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses machine learning techniques to determine the parameters of the probabilistic model. We apply our method to realdoi:10.1109/icisp.2006.7 fatcat:7tmctulat5bcxk6wioak6rz5gi