RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility ✱

Sebastien Varrette, Emmanuel Kieffer, Frederic Pinel, Ezhilmathi Krishnasamy, Sarah Peter, Hyacinthe Cartiaux, Xavier Besseron
2021 Practice and Experience in Advanced Research Computing  
High Performance Computing (HPC) is increasingly identified as a strategic asset and enabler to accelerate the research and the business performed in all areas requiring intensive computing and large-scale Big Data analytic capabilities. The efficient exploitation of heterogeneous computing resources featuring different processor architectures and generations, coupled with the eventual presence of GPU accelerators, remains a challenge. The University of Luxembourg operates since 2007 a large
more » ... demic HPC facility which remains one of the reference implementation within the country and offers a cutting-edge research infrastructure to Luxembourg public research. The HPC support team invests a significant amount of time (i.e., several months of effort per year) in providing a software environment optimised for hundreds of users, but the complexity of HPC software was quickly outpacing the capabilities of classical software management tools. Since 2014, our scientific software stack is generated and deployed in an automated and consistent way through the RESIF framework, a wrapper on top of Easybuild and Lmod [5] meant to efficiently handle user software generation. A large code refactoring was performed in 2017 to better handle different software sets and roles across multiple clusters, all piloted through a dedicated control repository. With the advent in 2020 of a new supercomputer featuring a different CPU architecture, and to mitigate the identified limitations of the existing framework, we report in this state-of-practice article RESIF 3.0, the latest iteration of our scientific software management suit now relying on streamline Easybuild. It permitted to reduce by around 90% the number of custom configurations previously enforced by specific Slurm and MPI settings, while sustaining optimised builds coexisting for different * Produces the permission block, and copyright information This work is licensed under a Creative Commons Attribution International 4.0 License.
doi:10.1145/3437359.3465600 fatcat:ylgfuymn4jbpvofjknzvu7nbrm