Fault Localization via Risk Modeling

R R Kompella, J Yates, A Greenberg, A C Snoeren
2010 IEEE Transactions on Dependable and Secure Computing  
Internet backbone networks are under constant flux in order to keep up with demand and to offer new features. The pace of change in features and technology often outstrips the pace of introduction of the associated fault monitoring capabilities that are built into today's IP protocols and routers. Moreover, some of these new technologies cross networking layers, raising the potential for unanticipated interactions and service disruptions, which the built-in monitoring capabilities in each layer
more » ... may not detect. In these instances, operators typically employ higher-layer monitoring techniques such as end-to-end liveness probing to detect lower-or cross-layer failures, but lack tools to precisely determine where a detected failure may have occurred. In this paper, we present a simple and effective mechanism to localize these failures. Our method applies spatial correlation to highlevel failure notifications to identify the lower-layer root-cause. We show that our system works with accuracy exceeding 80% for many failure scenarios, while delivering extremely high precision (greater than 80%). We further report our operational experience using spatial correlation to isolate optical component and MPLS control-plane failures in a tier-I ISP. Networking. She received the PhD degree from the University of Melbourne in 1998. PLACE PHOTO HERE Albert Greenberg is an AT&T Fellow and the Director of Network Measurement and Engineering Research at AT&T Labs-Research. Alberts recent research includes: novel methods for packet and flow measurement and analysis, traffic matrix inference, anomaly detection, configuration management, IP/MPLS control plane monitoring, MPLS/GMPLS control and management, IP traffic and network engineering, IP fault management and troubleshooting, new route control architectures, database and systems applications, and network security. Alberts
doi:10.1109/tdsc.2009.37 fatcat:unffkarjrfa3zfv7wtepajs7nu