Resilient Intrusion Tolerance through Proactive and Reactive Recovery

Paulo Sousa, Alysson Neves Bessani, Miguel Correia, Nuno Ferreira Neves, Paulo Verissimo
2007 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007)  
Previous works have studied how to use proactive recovery to build intrusion-tolerant replicated systems that are resilient to any number of faults, as long as recoveries are faster than an upper-bound on fault production assumed at system deployment time. In this work, we propose a complementary approach that combines proactive recovery with services that allow correct replicas to react and recover replicas that they detect or suspect to be compromised. One key feature of our proactivereactive
more » ... recovery approach is that, despite recoveries, it guarantees the availability of the minimum amount of system replicas necessary to sustain system's correct operation. We design a proactivereactive recovery service based on a hybrid distributed system model and show, as a case study, how this service can effectively be used to augment the resilience of an intrusion-tolerant firewall adequate for the protection of critical infrastructures. (RITAS) and the Large-Scale Informatic Systems Laboratory (LaSIGE). employed in mission-critical applications such as the SCADA systems used to manage critical infrastructures like the Power grid. One approach that promises to satisfy this requirement and that gained momentum recently is intrusion tolerance [31] . This approach recognizes the difficulty in building a completely reliable and secure system and advocates the use of redundancy to ensure that a system still delivers its service correctly even if some of its components are compromised. A problem with "classical" intrusion-tolerant solutions based on Byzantine fault-tolerant replication algorithms is the assumption that the system operates correctly only if at most f out of n of its replicas are compromised. The problem here is that given a sufficient amount of time, a malicious and intelligent adversary can find ways to compromise more than f replicas and collapse the whole system. Recently, some works showed that this problem can be solved (or at least minimized) if the replicas are rejuvenated periodically, using a technique called proactive recovery [21] . These previous works propose intrusion-tolerant replicated systems that are resilient to any number of faults [5, 34, 4, 17, 25] . The idea is simple: replicas are periodically rejuvenated to remove the effects of malicious attacks/faults. Rejuvenation procedures may change the cryptographic keys and/or load a clean version of the operating system. If the rejuvenation is performed sufficiently often, then an attacker is unable to corrupt enough replicas to break the system. Therefore, using proactive recovery, one can increase the resilience of any intrusion-tolerant replicated system able to tolerate up to f faults/intrusions: an unbounded number of intrusions may occur during its lifetime, as long as no more than f occur between rejuvenations. Both the interval between consecutive rejuvenations and f must be specified at system deployment time according to the expected rate of fault production. An inherent limitation of proactive recovery is that a malicious replica can execute any action to disturb the system's normal operation (e.g., flood the network with arbitrary packets) and there is little or nothing that a correct replica (that detects this abnormal behavior) can do to stop/recover the faulty replica. Our observation is that a more complete solution should allow correct replicas that detect or suspect that some replica is faulty to accelerate the recovery of this replica. We named this solution as proactive-reactive recovery and claim that it may improve the overall performance of a system under attack by reducing the amount of time a malicious replica has to disturb system normal operation without sacrificing periodic rejuvenation, which ensures that even dormant faults will be removed from the system. This work proposes the combination of proactive and reactive recovery in order to increase the
doi:10.1109/prdc.2007.52 dblp:conf/prdc/SousaBCNV07 fatcat:pgbo54dczndedao5jcdpsfbauq