Annotated Minimum Volume Sets for Nonparametric Anomaly Discovery

Clayton D. Scott, Eric D. Kolaczyk
2007 2007 IEEE/SP 14th Workshop on Statistical Signal Processing  
We consider an anomaly detection problem, wherein a combination of typical and anomalous data are observed and it is necessary to identify the anomalies in this particular dataset without recourse to labeled exemplars. We take as our goal to produce an annotated ranking of the observations, indicating the relative priority for each to be examined further as a possible anomaly, while making no assumptions on the distribution of typical data. We propose a framework in which each observation is
more » ... ked to a corresponding minimum volume set and, implicitly adopting a hypothesis testing perspective, each set is associated with a test. An inherent ordering of these sets yields a natural ranking, while the association of each test with a false discovery rate yields an appropriate annotation. The combination of minimum volume set methods with false discovery rate principles, in the context of data contaminated by anomalies, is new and estimation of the key underlying quantities requires that a number of issues be addressed. We offer some solutions to the relevant estimation problems, and illustrate the proposed methodology on synthetic and computer network traffic data. Index Terms-minimum volume sets, false discovery rate, nonparametric outlier detection, multiple level set estimation, monotone density estimation
doi:10.1109/ssp.2007.4301254 fatcat:76wezjttizayfa6kcb2wmpx6du