Outlier Detection via Minimum Spanning Tree

Xin Tang, Wei Huang, Xue Li, Shengli Li, Yuewen Liu
2016 Pacific Asia Conference on Information Systems  
In the big data era, analysis with data sets becomes more and more important. How to obtain valuable information from the data records is all we care about. However, most of the time, there are outliers among the data records. Outliers can lead to wrong information extracted from the data sets, detecting them can help us modify these rules or get them easier. In this paper, we combine the distance-based and clustering-based outlier detection methods, use the theory of minimum spanning tree and
more » ... tandard normal distribution to define a new method of outlier detection. At the same time, our algorithm can find the data records which we should pay attention to in the data sets. The algorithm works with two phases. During the first phase, we build a minimum spanning tree by all data records, compute the average weight and the standard deviation of it. In the second phase, we use the distance of each data
dblp:conf/pacis/TangHLLL16 fatcat:i3kx7pevrzgzhhg7t6qyd7mahu