Evaluation of Techniques to Detect Significant Network Performance Problems using End-to-End Active Network Measurements

R.L. Cottrell, C. Logg, M. Chhaparia, M. Grigoriev, F. Haro, F. Nazir, M. Sandford
2006 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006  
End-to-End fault and performance problems detection in wide area production networks is becoming increasingly hard as the complexity of the paths, the diversity of the performance, and dependency on the network increase. Several monitoring infrastructures are built to monitor different network metrics and collect monitoring information from thousands of hosts around the globe. Typically there are hundreds to thousands of time-series plots of network metrics which need to be looked at to
more » ... ooked at to identify network performance problems or anomalous variations in the traffic. Furthermore, most commercial products rely on a comparison with user configured static thresholds and often require access to SNMP-MIB information, to which a typical end-user does not usually have access. In our paper we propose new techniques to detect network performance problems proactively in close to realtime and we do not rely on static thresholds and SNMP-MIB information. We describe and compare the use of several different algorithms that we have implemented to detect persistent network problems using anomalous variations analysis in real end-to-end Internet performance measurements. We also provide methods and/or guidance for how to set the user settable parameters. The measurements are based on active probes running on 40 production network paths with bottlenecks varying from 0.5Mbits/s to 1000Mbit/s. For well behaved data (no missed measurements and no very large outliers) with small seasonal changes most algorithms identify similar events. We compare the algorithms' robustness with respect to false positives and missed events especially when there are large seasonal effects in the data. Our proposed techniques cover a wide variety of network paths and traffic patterns. We also discuss the applicability of the algorithms in terms of their intuitiveness, their speed of execution as implemented, and areas of applicability. Our encouraging results compare and evaluate the accuracy of our detection techniques when applied to step down/up, diurnal changes and congestion effects.
doi:10.1109/noms.2006.1687541 dblp:conf/noms/CottrellLCGHNS06 fatcat:g73yw6njvbanrkcpetv3atvcl4