Cluster Based Outlier Detection Algorithm for Healthcare Data

A. Christy, G. Meera Gandhi, S. Vaithyasubramanian
2015 Procedia Computer Science  
Outliers has been studied in a variety of domains including Big Data, High dimensional data, Uncertain data, Time Series data, Biological data, etc. In majority of the sample datasets available in the repository, atleast 10% of the data may be erroneous, missing or not available. In this paper, we utilize the concept of data preprocessing for outlier reduction. We propose two algorithms namely Distance-Based outlier detection and Cluster-Based outlier algorithm for detecting and removing
more » ... s using a outlier score. By cleaning the dataset and clustering based on similarity, we can remove outliers on the key attribute subset rather than on the full dimensional attributes of dataset. Experiments were conducted using 3 built-in Health care dataset available in R package and the results show that the cluster-based outlier detection algorithm providing better accuracy than distance based outlier detection algorithm.
doi:10.1016/j.procs.2015.04.058 fatcat:fokovmxv7vfhfpfqmghqnxvp7y