Application-Driven Requirements for Node Resource Management in Next-Generation Systems

Edgar A. Leon, Balazs Gerofi, Julien Jaeger, Guillaume Mercier, Rolf Riesen, Masamichi Takagi, Brice Goglin
2020 2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)  
Emerging workloads on supercomputing platforms are pushing the limits of traditional high-performance computing software environments. Multi-physics, coupled simulations, big data processing and machine learning frameworks, and multicomponent workloads pose serious challenges to system and application developers. At the heart of the problem is the lack of cross-stack coordination to enable flexible resource management among multiple runtime components. In this work, we analyze seven real-world
more » ... pplications that represent emerging workloads and illustrate the scope and magnitude of the problem. We then extract several themes from these applications that highlight next-generation requirements for node resource managers. Finally, using these requirements, we propose a general, cross-stack coordination framework and outline its components and functionality.
doi:10.1109/ross51935.2020.00006 fatcat:mp3krqiv55bgrhdtr5skurgg3u