Outlier detection for high dimensional data

Charu C. Aggarwal, Philip S. Yu
2001 Proceedings of the 2001 ACM SIGMOD international conference on Management of data - SIGMOD '01  
The outlier detection problem has important applications in the eld of fraud detection, netw ork robustness analysis, and intrusion detection. Most such applications are high dimensional domains in whic hthe data can con tain hundreds of dimensions. Many recen t algorithms use concepts of pro ximit y in order to nd outliers based on their relationship to the rest of the data. Ho w ever, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness.
more » ... its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective o f p r o ximity-based de nitions. Consequently, for high dimensional data, the notion of nding meaningful outliers becomes substantially more complex and non-obvious. In this paper, w e discuss new techniques for outlier detection whic h nd the outliers by studying the behavior of projections from the data set.
doi:10.1145/375663.375668 dblp:conf/sigmod/AggarwalY01 fatcat:oalq6xkqabfododas4mnkwiayy