I/O-Aware Batch Scheduling for Petascale Computing Systems

Zhou Zhou, Xu Yang, Dongfang Zhao, Paul Rich, Wei Tang, Jia Wang, Zhiling Lan
2015 2015 IEEE International Conference on Cluster Computing  
In Big Data era, the gap between the storage performance and application's I/O requirement is increasingly enlarged. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable, and therefore severely harms the performance. Conventional approaches either focus on optimizing an application's access pattern individually or handle I/O requests on low-level storage layer without any knowledge from the upper-level applications. In this paper, we present a novel
more » ... ware batch scheduling framework to coordinate ongoing I/O requests on petascale computing systems. The motivation behind this innovation is that the batch scheduler has a holistic view of both system state and jobs' activities and can control jobs' status on the fly during their execution. We treat a job's I/O requests as periodical subjobs within its lifecycle and transform the I/O congestion issue into a classical scheduling problem. We design two scheduling polices with different scheduling objectives either on useroriented metrics or system performance. We conduct extensive trace-based simulations using real job traces and I/O traces from a production IBM Blue Gene/Q system. Experimental results demonstrate that our design can effectively improve job performance by more than 30% as well as system performance.
doi:10.1109/cluster.2015.45 dblp:conf/cluster/ZhouYZRTWL15 fatcat:p7orvdwhlvc4ti4grlizgalg6m