A Fault Hypothesis for Integrated Architectures

R. Obermaisser, P. Peti
2006 2006 International Workshop on Intelligent Solutions in Embedded Systems  
Integrated architectures in the automotive and avionic domain promise improved resource utilization and enable a better tactic coordination of application subsystems compared to federated systems. In order to support safety-critical application subsystems, an integrated architecture needs to support fault-tolerant strategies that enable the continued operation of the system in the presence of failures. The basis for the implementation and validation of fault-tolerant strategies is a fault
more » ... esis that identifies the fault containment regions, specifies the failure modes and provides realistic failure rate assumptions. This paper describes a fault hypothesis for integrated architectures, which takes into account the collocation of multiple software components on shared node computers. We argue in favor of a differentiation of fault containment regions for hardware and software faults. In addition, the fault hypothesis describes the assumptions concerning the respective frequencies of transient and permanent failures in consideration of recent semiconductor trends. within EU Framework Programme 6. The DECOS architecture introduces a distributed computer system, where the node computers are interconnected by a time-triggered network. Each node computer is shared among multiple software components in order to overcome the prevalent "1 Function -1 Electronic Control Unit (ECU)" limitation of present day electronic systems [7, 8] . This architecture provides a framework with generic architectural services (e.g., predictable exchange of message, clock synchronization) for integrating multiple application subsystems within a single distributed computer system. A single time-triggered physical network handles the message exchanges between the node computers hosting the application subsystems. Each application subsystem is provided with guaranteed communication resources via so-called virtual networks [9] . In analogy, guaranteed computational resources of the node computers (e.g., CPU time, memory, I/O) are assigned to software components by employing a partition management operating system. In the DECOS architecture, the sharing of the node computers and the common physical network among software components from different application subsystems determines the assumptions concerning fault containment regions in the fault hypothesis. The basic idea is the differentiation of fault containment regions for software and for hardware faults. This paper extends the existing fault hypothesis of the Time-Triggered Architecture (TTA) [4], which regards each node computer as an atomic unit in the fault hypothesis. This node-centric view is characteristic for federated systems and encompasses no discrimination between hardware and software faults. In the introduced fault hypothesis, we recognize that the different software components on a node computer are to a high degree independent with respect to software faults. Firstly, the different software components provide different application services. Furthermore, the software components integrated within a node computer typically originate from different vendors, each employing its own design teams, development tools, and development processes. This insight is significant for supporting mixed criticality systems, in which software components with different criticality levels are collocated on shared node computers. Based on the assumptions in the fault hypothesis, the architecture must support error containment between software components in order to enable modular certification [10] of the complete computer system. Otherwise an elevation of the criticality for all software components to the highest criticality level of a software module in the system would become necessary. Note that the fault hypothesis defined in this paper is not restricted to the DECOS architecture, but also suitable for integrated architectures like the Automotive Open System Architecture (AUTOSAR) [11] . The paper is structured as follows. Section 2 presents a generic system model for integrated system architectures. We describe the general structure of a fault hypothesis for safety-critical distributed real-time systems in Section 3. Based on this general structure of a fault hypothesis, Section 4 provides an instantiation for the integrated DECOS architecture. The paper finishes with a conclusion in Section 5. System Model of an Integrated Architecture In integrated architectures computational resources (e.g., processor time, memory) and communication resources (i.e. network) are shared among multiple software components in order to reduce the number of deployed node computers, associated wiring, and to avoid
doi:10.1109/wises.2006.329115 dblp:conf/wises/ObermaisserP06 fatcat:saxit5brevagdpd2gkzlhwcjgm