Anomaly detection and diagnosis in grid environments

Lingyun Yang, Chuang Liu, Jennifer M. Schopf, Ian Foster
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
Identifying and diagnosing anomalies in application behavior is critical to delivering reliable application-level performance. In this paper we introduce a strategy to detect anomalies and diagnose the possible reasons behind them. Our approach extends the traditional window-based strategy by using signal-processing techniques to filter out recurring, background fluctuations in resource behavior. In addition, we have developed a diagnosis technique that uses standard monitoring data to
more » ... where related changes in behavior occur at the times of the anomalies. We evaluate our anomaly detection and diagnosis technique by applying it in three contexts and inserting anomalies into the system at random intervals. The experimental results show that our strategy detects up to 96% of anomalies while reducing the fault positive rate by up to 90% compared to the traditional window average strategy. In addition, our strategy can diagnose the reason for the anomaly approximately 75% of the time.
doi:10.1145/1362622.1362667 dblp:conf/sc/YangLSF07 fatcat:yoqgstukn5bilj5n3mmupb7k3i