Detailed diagnosis in enterprise networks

Srikanth Kandula, Ratul Mahajan, Patrick Verkaik, Sharad Agarwal, Jitendra Padhye, Paramvir Bahl
2009 Computer communication review  
By studying trouble tickets from small enterprise networks, we conclude that their operators need detailed fault diagnosis. at is, the diagnostic system should be able to diagnose not only generic faults (e.g., performance-related) but also application speci c faults (e.g., error codes). It should also identify culprits at a ne granularity such as a process or rewall con guration. We build a system, called NetMedic, that enables detailed diagnosis by harnessing the rich information exposed by
more » ... dern operating systems and applications. It formulates detailed diagnosis as an inference problem that more faithfully captures the behaviors and interactions of negrained network components such as processes. e primary challenge in solving this problem is inferring when a component might be impacting another. Our solution is based on an intuitive technique that uses the joint behavior of two components in the past to estimate the likelihood of them impacting one another in the present. We nd that our deployed prototype is e ective at diagnosing faults that we inject in a live environment. e faulty component is correctly identied as the most likely culprit in  of the cases and is almost always in the list of top ve culprits.
doi:10.1145/1594977.1592597 fatcat:bfv63flusvhmvm7vsswrvkldgi