Distributed scheduling algorithms to improve the performance of parallel data transfers

Dannie Durand, Ravi Jain, David Tseytlin
1994 SIGARCH Computer Architecture News  
The cost of data transfers, and in particular of I/O operations, is a growing problem in parallel computing. This performance bottleneck is especially severe for data-intensive applications such a s m ultimedia information systems, databases, and Grand Challenge problems. A promising approach to alleviating this bottleneck i s t o s c hedule parallel I/O operations explicitly. Although centralized algorithms for batch s c heduling of parallel I/O operations have previously been developed, they
more » ... re not be appropriate for all applications and architectures. We develop a class of decentralized algorithms for scheduling parallel I/O operations, where the objective is to reduce the time required to complete a given set of transfers. These algorithms, based on edge-coloring and matching of bipartite graphs, rely upon simple heuristics to obtain shorter schedules. We present simulation results indicating that the best of our algorithms can produce schedules whose length is within 2 -20% of the optimal schedule, a substantial improvement on previous decentralized algorithms. We discuss theoretical and experimental work in progress and possible extensions. problems can be solved. Three examples of such applications are multimedia information systems, scienti c computations with massive datasets and databases. Hence, using parallelism to improve the performance of the I/O subsystem is an important emerging research area. In this paper, we present w ork in progress on distributed scheduling algorithms to improve performance in a class of parallel I/O subsystems which can be modeled by bipartite graphs. These algorithms are based on the graph-theoretic notions of bipartite matching and edge-coloring. In Section 2, we survey recent w ork on parallel I/O and discuss how our work ts into this context. Relevant previous work in scheduling is also reviewed. A detailed description of the problem is given in Section 3 and relevant ideas from graph theory are discussed. A class of decentralized algorithms to solve this problem is introduced in Section 4. Simulation results are presented in Section 5 and continuing work is described. Future work is discussed in Section 6. Our results are summarized in the conclusion. Background A v ariety of approaches to the I/O bottleneck, from algorithmic to low l e v el hardware solutions, have been proposed. These include both methods to improve the rate of I/O delivery to uniprocessor systems by i n troducing parallelism into the I/O subsystem, and methods of improving the I/O performance of multiprocessors. At the highest level, new theoretical models of parallel I/O systems are being developed 1, 33, 25, 3 2 ], allowing the study of many fundamental algorithms in terms of their I/O complexity. A t the next level, new language and compiler features are being developed to support I/O parallelism and optimizations, using data layout conversion 12] and compiler hints 29]. Operating systems optimizations include layer integration and integrated bu er management to reduce copying costs, and research in le systems 9, 2 2 ]. At the lowest level, performance improvements are being achieved at the hardware and network level. Fine-grain parallelism at the disk level has been proposed through mechanisms such as disk striping, interleaving, RAID and RADD 28, 3 1 ]. Finally, to support solutions to the I/O problem, new disk architectures must be su ciently exible and programmable that new I/O paradigms can be implemented and tested. Kotz and Cormen 21, 1 1 ] h a ve studied these requirements.
doi:10.1145/190787.190799 fatcat:yvdxrc4fffdmflfevwrqkd3ede