IMPROVED OUTLIER DETECTION USING CLASSIC KNN ALGORITHM

K Divya, N Kumaran
International Research Journal of Engineering and Technology   unpublished
Outlier detection is used for identification of items, events or observations which do not conform to an expected pattern or other items in dataset. The identification of instances that diverge from the expected behavior is a important task. Existing techniques provides a solution to the problem of anomaly detection in categorical data with a semi supervised setting. The outlier detection approach is based on distance learning for categorical attributes (DILCAs), a distance learning framework
more » ... s introduced. The key intuition of DILCA is that the distance between the two values of a categorical attribute can be determined by the way, in which they co-occur with the values of other attributes in the data set. Existing techniques work well for fixed-schema data, with low dimensionality. certain applications require privacy preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. This work proposes novel anonymization methods for sparse high-dimensional data. It is based on approximate Classic K-Nearest Neighbor search in high-dimensional spaces. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. Among the proposed techniques, Classic KNN-search yields superior data utility, but incurs higher computational overhead. In addition dimensionality reduction technique is used. In this work healthcare dataset are used
fatcat:noyfwhbfnfb4hou64mrawu3yr4