Managing space system faults: Coalescing NASA's views

B. Muirhead, L. Fesq
2012 2012 IEEE Aerospace Conference  
Managing faults and their resultant failures is a fundamental and critical part of developing and operating aerospace systems. Yet, recent studies have shown that the engineering "discipline" required to manage faults is not widely recognized nor evenly practiced within the NASA community. Attempts to simply name this discipline in recent years has been fraught with controversy among members of the Integrated Systems Health Management (ISHM), Fault Management (FM), Fault Protection (FP), Hazard
more » ... Analysis (HA), and Aborts communities. Approaches to managing space system faults typically are unique to each organization, with little commonality in the architectures, processes and practices across the industry. A spectrum of issues and options affect the scope and implementation of how faults are managed within space systems. At one end of this spectrum are activities that manage faults via prevention and containment, and typically are performed either before flight or in non-real-time such as designing in margins or inspecting airframes for fractures. On the other end of the spectrum lie activities that manage faults after they occur, including detection, isolation, diagnosis and response. Mission characteristics such as the length of the mission, human vs. robotic, availability of communication with a control center, risk and cost profile drive very different approaches to emphasizing different ends of this spectrum. Human spaceflight missions to low Earth orbit experience almost continuous communication with ground controllers and design for round-trips. Alternately, deep-space robotic probes are one-way missions that experience long communication delays and outages. These characteristics drive the focus of managing space system faults into the non-real-time prevention/containment end of the spectrum for the former, and toward the respond-to-faults end of the spectrum for the latter. In fact, automating these capabilities is especially critical for deep space and planetary missions where the limited communication opportunities may prevent timely intervention by ground control. With ever increasing complexity in aerospace systems, the task of managing faults becomes both increasingly important and increasingly complex. As NASA reaches toward the goal of sending humans beyond the Earth-moon system, there is a significant need to better understand the challenges, options and technologies of managing faults. Architects and stakeholders need to become more aware and conversant in the issues and design options early in development and thereby balance/optimize automation vs. human-in-the-loop handling of faults. To achieve long duration human spaceflight to asteroids and/or Mars, NASA must employ the experience across the sub-communities that, until now, have taken very different approaches to managing faults. This paper describes the diverse views and approaches that must be coalesced in order to successfully achieve NASA's future space missions.
doi:10.1109/aero.2012.6187264 fatcat:ijaskyonwrdsrplesh4huwu4yq