Scylla: A Mesos Framework for Container Based MPI Jobs [article]

Pankaj Saha, Angel Beltre, Madhusudhan Govindaraju
2019 Figshare  
Open source cloud technologies provide a wide range of support forcreating customized compute node clusters to schedule tasks andmanaging resources. In cloud infrastructures such asJetstream and Chameleon, which are used for scientific research, usersreceive complete control of the Virtual Machines (VM) that are allocated tothem. Importantly, users get root access to the VMs. This provides anopportunity for HPC users to experiment with new resource managementtechnologies such as Apache Mesos
more » ... as Apache Mesos that have proven scalability,flexibility, and fault tolerance. To ease the development anddeployment of HPC tools on the cloud, the containerization technologyhas matured and is gaining interest in the scientific community. Inparticular, several well known scientific code bases now have publiclyavailable Docker containers. While Mesos provides support for Dockercontainers to execute individually, it does not provide support forcontainer inter-communication or orchestration of the containers for aparallel or distributed application. In this paper, we present thedesign, implementation, and performance analysis of a Mesos framework,{\it Scylla}, which integrates Mesos with Docker Swarm to enableorchestration of MPI jobs on a cluster of VMs acquired from theChameleon cloud\cite{ChameleonCloud}. Scylla uses Docker Swarm for communication betweencontainerized tasks (MPI processes) and Apache Mesos for resourcepooling and allocation. Scylla allows a policy driven approach todetermine how the containers should be distributed across the nodesdepending on the CPU, memory, and network throughput requirement foreach application.
doi:10.6084/m9.figshare.8156468 fatcat:abw44hr7xrayhbt3l7bxdyo4qq