MOLAR

Christian Engelmann, P. Sadayappan, Stephen L. Scott, David E. Bernholdt, Narasimha R. Gottumukkala, Chokchai Leangsuksun, Jyothish Varma, Chao Wang, Frank Mueller, Aniruddha G. Shet
2006 ACM SIGOPS Operating Systems Review  
MOLAR is a multi-institutional research effort that concentrates on adaptive, reliable, and efficient operating and runtime system (OS/R) solutions for ultra-scale, high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined in FAST-OS (forum to address scalable technology for runtime and operating systems) and HECRTF (high-end computing revitalization task force) activities by exploring the use of advanced monitoring and adaptation to
more » ... improve application performance and predictability of system interruptions, and by advancing computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. This paper describes recent research of the MOLAR team in advancing RAS for high-end computing OS/Rs. OVERVIEW Current operating systems and runtime systems (OS/Rs) for high-end scientific computing (HEC) are not able to meet the various requirements to run large applications efficiently on future ultra-scale computers. Building on the current open-source operating system, Linux, we target HEC applications for the next generation of supercomputers. Undoubtedly, these HEC OS/Rs must scale to the levels pre- * Research sponsored by the Office dicted by hardware architects for both shared memory and distributed memory platforms. Furthermore, they must enable applications to operate efficiently and reliably on any of these architectures as transparently as possible. As described in recent reports by FAST-OS [14] (Forum to Address Scalable Technology for Runtime and Operating Systems), HECRTF [24] (High-End Computing Revitalization Task Force) and ScaLeS [41] (Science Case for Large-scale Simulation) activities, system software is a key challenge in exploiting the promise of extreme-scale scientific computing. Conceptually, the MOLAR [15] research has the following goals to address these issues. • Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions.
doi:10.1145/1131322.1131337 fatcat:ihpzd2m7vffbdeznsh6upvvoey