Isolating and Tolerating SDN Application Failures with LegoSDN

Balakrishnan Chandrasekaran, Brendan Tschaen, Theophilus Benson
2016 Proceedings of the Symposium on SDN Research - SOSR '16  
Despite software-defined networking's proven benefits, there remains a significant reluctance in adopting it. Among the issues that hamper SDN's adoption, two issues stand out: reliability and fault tolerance. At the heart of these issues is a set of fate-sharing relationships: the first between the SDN control applications and controllers, wherein the crash of the former induces a crash of the latter, thereby affecting the controller's availability; and, the second between the SDN-Apps and the
more » ... network, wherein the failure of the former violates network safety, e.g., network-loops, or network availability, e.g., black holes. In this paper, we argue for a redesign of the controller architecture centering around a set of abstractions to eliminate these fate-sharing relationships and thus improve the controller's availability. We present a prototype implementation of a framework, called LegoSDN, that embodies our abstractions, and we demonstrate the benefits of our abstractions by evaluating LegoSDN on an emulated network with five real SDN-Apps. Our evaluations show that LegoSDN can recover failed SDN-Apps 3× faster than controller reboots while simultaneously preventing policy violations.
doi:10.1145/2890955.2890965 dblp:conf/sosr/0002TB16 fatcat:jul46amjdjaejmb66gcta7sllm