Task Scheduling and File Replication for Data-Intensive Jobs with Batch-shared I/O

G. Khanna, N. Vydyanathan, U. Catalyurek, T. Kurc, S. Krishnamoorthy, P. Sadayappan, J. Saltz
2006 15th IEEE International Conference on High Performance Distributed Computing  
This paper addresses the problem of efficient execution of a batch of data-intensive tasks with batch-shared I/O behavior, on coupled storage and compute clusters. Two scheduling schemes are proposed: 1) a 0-1 Integer Programming (IP) based approach, which couples task scheduling and data replication, and 2) a bi-level hypergraph partitioning based heuristic approach (BiPartition), which decouples task scheduling and data replication. The experimental results show that: 1) the IP scheme
more » ... the best batch execution time, but has significant scheduling overhead, thereby restricting its application to small scale workloads, and 2) the BiPartition scheme is a better fit for larger workloads and systems -it has very low scheduling overhead and no more than 5-10% degradation in solution quality, when compared with the IP based approach.
doi:10.1109/hpdc.2006.1652155 dblp:conf/hpdc/KhannaVCKKSS06 fatcat:fjjimqogvjdyncrbztsmomcuom