Operating System Support to Detect Application Hangs

G. Carrozza, M. Cinque, D. Cotroneo, R. Natella
2008 unpublished
On-line failure detection is an essential means to control and assess the dependability of complex and critical software systems. In such context, effective detection strategies are required, in order to minimize the possibility of catastrophic consequences. This objective is however difficult to achieve in complex systems, especially due to the several sources of non-determinism (e.g., multi-threading and distributed interaction) which may lead to software hangs, i.e., the system is active but
more » ... no longer capable of delivering its services. The paper proposes a detection approach to uncover application hangs. It exploits multiple indirect data gathered at the operating system level to monitor the system and to trigger alarms if the observed behavior deviates from the expected one. By means of fault injection experiments conducted on a research prototype, it is shown how the combination of several operating system monitors actually leads to an high quality of detection, at an acceptable overhead.
doi:10.14236/ewic/vecos2008.11 fatcat:eeh5mnornfhbffgp6vfzg2goo4