Filters








79,378 Hits in 4.1 sec

Robust Outlier Detection Technique in Data Mining: A Univariate Approach [article]

Singh Vijendra, Pathak Shivani
2014 arXiv   pre-print
This paper describes an approach which uses Univariate outlier detection as a pre-processing step to detect the outlier and then applies K-means algorithm hence to analyse the effects of the outliers on  ...  The main challenges of outlier detection with the increasing complexity, size and variety of datasets, are how to catch similar outliers as a group, and how to evaluate the outliers.  ...  Four categories of unsupervised outlier detection algorithms; (1) In a clustering-based method, like DBSCAN (a density-based algorithm for discovering clusters in large spatial databases) [9] , outliers  ... 
arXiv:1406.5074v1 fatcat:bw6z3yuktfaflkxjf7vorjtzaa

Local and Global Outlier Detection Algorithms in Unsupervised Approach: A Review

Ayad Jabbar
2021 Iraqi Journal for Electrical And Electronic Engineering  
A comprehensive and structured overview of a large set of interesting outlier algorithms, which emphasized the outlier detection limitation in the unsupervised approach, can be used as a guideline for  ...  Meanwhile, the second approach determines the outliers without human interaction.  ...  The definition efficiently ranks each object based on distance without requiring a distancing parameter as a pre-defined parameter.  ... 
doi:10.37917/ijeee.17.1.9 fatcat:w7mvxscabzbcrgftxy7vikv22q

Data Reduction for Optimizing Feature Selection in Modeling Intrusion Detection System

Alif Iman, Institut Teknologi Sepuluh Nopember, Tohari Ahmad, Institut Teknologi Sepuluh Nopember
2020 International Journal of Intelligent Engineering and Systems  
There are several methods to perform data reduction, one of which uses outlier detection techniques.  ...  In this research, the outlier detection is done by a circle generated from the k -means clustering of all selected features.  ...  The second, outlier is formed based on the median value of the cluster.  ... 
doi:10.22266/ijies2020.1231.18 fatcat:ypdga4unxba7lf2rry6di2m7va

DBSCAN OPTIMIZATION FOR IMPROVING MARINE TRAJECTORY CLUSTERING AND ANOMALY DETECTION

X. Han, C. Armenakis, M. Jadidi
2020 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
This paper presents a clustering method for route planning and trajectory anomalies detection, which are the essential part of auto-vessel system design and development.  ...  In this paper, we present the development of an enhanced density-based spatial clustering (DBSCAN) method that can be applied on historical or real-time Automatic Identification System (AIS) data, so that  ...  As DBSCAN is relying on a density-based notion of clusters, this consider to be an effective method to discover clusters of arbitrary shapes as well as identifying outliers (Ester et al., 1996) .  ... 
doi:10.5194/isprs-archives-xliii-b4-2020-455-2020 fatcat:pky5piykwzhz7nuq6iglqt54sy

Outlier Detection Based on Low Density Models

Felix Iglesias Vazquez, Tanja Zseby, Arthur Zimek
2018 2018 IEEE International Conference on Data Mining Workshops (ICDMW)  
Most outlier detection algorithms are based on lazy learning or imply quadratic complexity.  ...  In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data.  ...  Clustering-based methods In clustering-based methods, algorithms first find clusters and subsequently discover and rank outliers based on the clustering solution.  ... 
doi:10.1109/icdmw.2018.00140 dblp:conf/icdm/VazquezZZ18 fatcat:yr4lyiz5hzcgpbpd4xxnt36fpe

Towards an outlier detection model in text data stream

Awab Noori, Universiti Utara Malaysia, Malaysia
2019 International Journal of Advanced Trends in Computer Science and Engineering  
Therefore, detecting outlier in text stream is not a trivial task. This paper proposes a conceptual model to detect outliers in the text stream.  ...  This study proposes an outlier detection model in text data stream. Text stream is an important variant of data stream clustering.  ...  Therefore, density-based methods used for outlier detection in two ways. The first one, as independent points that do not fit into any of the clusters.  ... 
doi:10.30534/ijatcse/2019/47862019 fatcat:hn7crfcls5eafi3ck6dxvc6pxi

Diagnosis for Early Stage of Breast Cancer using Outlier Detection Algorithm Combined with Classification Technique

2019 International Journal of Engineering and Advanced Technology  
The proposed method has a process of three stages. First, data objects are grouped into clusters using k-means clustering algorithm.  ...  The second stage, the outlier detection (OD) algorithm has used to detect the outliers from the cancer dataset.  ...  This method obtains an accuracy of 98.13% for BC dataset with outlier and 99.01% of accuracy for BC dataset without outlier based on the 10-fold cross validation technique.  ... 
doi:10.35940/ijeat.b4514.129219 fatcat:qrumy7lsorhllcnn65lkxemvja

Detection of Outliers in Multivariate Data: A Method Based on Clustering and Robust Estimators [chapter]

Carla M. Santos-Pereira, Ana M. Pires
2002 Compstat  
The final decision on whether all the observations belonging to a given cluster (not previously removed, that is with size greater than 2p + 1) are outliers is based on a table of between clusters Mahalanobis-type  ...  However, ten strange observations (looking more like a "6" than a "0") were detected by all the clustering methods but not by the classical Mahalanobis distance.  ...  MD with RMCD25 MD withx and S Situation Method k p1 p2 p3 p4 p1 p2 p3 p4 k-means 3 nd nd 0.199 0.14 nd nd 0.063 0.  ... 
doi:10.1007/978-3-642-57489-4_41 dblp:conf/compstat/Santos-PereiraP02 fatcat:ps3343eho5afjbtbp73csbb3ju

Variable Selection and Outlier Detection for Automated K-means Clustering

Sung-Soo Kim
2015 Communications for Statistical Applications and Methods  
To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach.  ...  An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a  ...  To detect potential outliers, we used a hybrid approach that combines a clustering based approach and distance based approach using (robust) Mahalanobis distance.  ... 
doi:10.5351/csam.2015.22.1.055 fatcat:xxufivpthzhhtisapvm4lgy53u

HTsort: Enabling Fast and Accurate Spike Sorting on Multi-Electrode Arrays

Keming Chen, Yangtao Jiang, Zhanxiong Wu, Nenggan Zheng, Haochuan Wang, Hui Hong
2021 Frontiers in Computational Neuroscience  
Second, the clustering method HDBSCAN (hierarchical density-based spatial clustering of applications with noise) is used to classify spikes and detect overlapping events (multiple spikes firing simultaneously  ...  First, the divide-and-conquer method is employed to utilize electrode spatial information to achieve pre-clustering.  ...  Then, the hierarchical density-based spatial clustering method HDBSCAN is used to accomplish the clustering task and outlier detection.  ... 
doi:10.3389/fncom.2021.657151 fatcat:zwoj2prz6nfhtiqvivzuufm3ne

The Design of Pre-Processing Multidimensional Data Based on Component Analysis

Rahmat Widia Sembiring, Jasni Mohamad Zain
2011 Computer and Information Science  
Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent.  ...  Pre-processing is required because of lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding.  ...  This work supported in part by a grant from GRS090116.  ... 
doi:10.5539/cis.v4n3p106 fatcat:ts4h2dki25amlkuim6b7jj6jwu

Ranking outlier nodes in subspaces of attributed graphs

E. Muller, P. I. Sanchez, Y. Mulle, K. Bohm
2013 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)  
Subspace clustering provides a selected subset of nodes and its relevant attributes in which deviation of nodes can be observed.  ...  It includes a ground truth of real outliers labeled in a user experiment.  ...  ACKNOWLEDGMENTS This work is supported by the Young Investigator Group program of KIT as part of the German Excellence Initiative, by a post-doctoral fellowship of the research foundation Flanders (FWO  ... 
doi:10.1109/icdew.2013.6547453 dblp:conf/icde/MullerSMB13 fatcat:uytc7sxwlfgutp7jjvv66zvwoq

Multi-Level Clustering-Based Outlier's Detection (MCOD) Using Self-Organizing Maps

Menglu Li, Rasha Kashef, Ahmed Ibrahim
2020 Big Data and Cognitive Computing  
This paper proposes a multi-level outlier detection algorithm (MCOD) that uses multi-level unsupervised learning to cluster the data and discover outliers.  ...  The performance of existing outlier detection methods is limited by the pattern/behaviour of the dataset; these methods may not perform well without prior knowledge of the dataset.  ...  Distribution-Based Outlier Detection The distribution-based method is known as statistical-based outlier detection, which assumes, that in a normal dataset without outliners, all data follow a stochastic  ... 
doi:10.3390/bdcc4040024 fatcat:ucn2clyevvhhxaokujz7vtggs4

Outlier Detection for 2D Temperature Data

Jani Posio, Kauko Leiviskä, Jari Ruuska, Paavo Ruha
2008 IFAC Proceedings Volumes  
This makes the data pre-processing, and especially the outlier detection, of utmost importance for a reliable process and fault analysis.  ...  This makes the data pre-processing, and especially the outlier detection, of utmost importance for a reliable process and fault analysis.  ...  robust estimates, and (3) an outlier detection procedure based on wavelet coefficients.  ... 
doi:10.3182/20080706-5-kr-1001.00333 fatcat:mftam4nvzjfcblghz45enhv7ua

An Automated Framework for Enterprise Financial Data Pre-processing and Secure Storage

Sirisha Alamanda, Suresh Pabboju, G. Narasimha
2021 International Journal of Advanced Computer Science and Applications  
count based binary iterations method and finally the secure data storage using regression based key generation.  ...  Thus, this work proposes an automated framework for identification and imputation of the outliers using the iterative clustering method, identification and imputation of the missing values using Differential  ...  However, the generic method of clustering can be sufficient for detection of the groups based on properties, but the detection of outliers or any other anomalies can be highly difficult by the generic  ... 
doi:10.14569/ijacsa.2021.0120790 fatcat:qi2hik5d4fd2piqzqcjqu5rg5e
« Previous Showing results 1 — 15 out of 79,378 results