Locality Sensitive Outlier Detection: A ranking driven approach

Ye Wang, Srinivasan Parthasarathy, Shirish Tatikonda
2011 2011 IEEE 27th International Conference on Data Engineering  
Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distancebased outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points
more » ... according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distancebased outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.
doi:10.1109/icde.2011.5767852 dblp:conf/icde/WangPT11 fatcat:wxhuqq5wovealk257rtms77ko4