Model-based runtime analysis of distributed reactive systems
Australian Software Engineering Conference (ASWEC'06)
Reactive distributed systems have pervaded everyday life and objects, but often lack measures to ensure adequate behaviour in the presence of unforeseen events or even errors at runtime. As interactions and dependencies within distributed systems increase, the problem of detecting failures which depend on the exact situation and environment conditions they occur in grows. As a result, not only the detection of failures is increasingly difficult, but also the differentiation between the symptoms
... of a fault, and the actual fault itself, i. e., the cause of a problem. In this paper, we present a novel and efficient approach for analysing reactive distributed systems at runtime, in that we provide a framework for detecting failures as well as identifying their causes. Our approach is based upon monitoring safety-properties, specified in the linear time temporal logic LTL (respectively, TLTL) to automatically generate monitor components which detect violations of these properties. Based on the results of the monitors, a dedicated diagnosis is then performed in order to identify explanations for the misbehaviour of a system. These may be used to store detailed log files, or to trigger recovery measures. Our framework is built modular, layered, and uses merely a minimal communication overhead-especially when compared to other, similar approaches. Further, we sketch first experimental results from our implementations, and describe how it can be used to build a variety of distributed systems using our techniques.