Fluid: Resource-aware Hyperparameter Tuning Engine

Peifeng Yu, Jiachen Liu, Mosharaf Chowdhury
2021 Conference on Machine Learning and Systems  
Current hyperparameter tuning solutions lack complementary execution engines to efficiently leverage distributed computation, thus ignoring the possibility of intra-and inter-GPU sharing, which exhibits poor resource usage. In this paper, we present Fluid, a generalized hyperparameter tuning execution engine, that coordinates between hyperparameter tuning jobs and cluster resources. Fluid schedules evaluation trials in such jobs using a waterfilling approach to make the best use of resources
more » ... h at intra-and inter-GPU granularities to speed up the tuning process. By abstracting a hyperparameter tuning job as a sequence of TrialGroup, Fluid can boost the performance of diverse hyperparameter tuning solutions. Our experiments show that Fluid can speed up synchronous BOHB by 100%, and BOHB and ASHA by 30% while having similar final accuracy.
dblp:conf/mlsys/YuLC21 fatcat:4nvwv4dbavbbzaef5qc3sedigi