Scheduling dynamic parallelism on accelerators

Filip Blagojevic, Costin Iancu, Katherine Yelick, Matthew Curtis-Maury, Dimitrios S. Nikolopoulos, Benjamin Rose
2009 Proceedings of the 6th ACM conference on Computing frontiers - CF '09  
Resource management on accelerator based systems is complicated by the disjoint nature of the main CPU and accelerator, which involves separate memory hierarhcies, different degrees of parallelism, and relatively high cost of communicating between them. For applications with irregular parallelism, where work is dynamically created based on other computations, the accelerators may both consume and produce work. To maintain load balance, the accelerators hand work back to the CPU to be scheduled.
more » ... In this paper we consider multiple approaches for such scheduling problems and use the Cell BE system to demonstrate the different schedulers and the trade-offs between them. Our evaluation is done with both microbenchmarks and two bioinformatics applications (PBPI and RAxML). Our baseline approach uses a standard Linux scheduler on the CPU, possibly with more than one process per CPU. We then consider the addition of cooperative scheduling to the Linux kernel and a user-level work-stealing approach. The two cooperative approaches are able to decrease SPE idle time, by 30% and 70%, respectively, relative to the baseline scheduler. In both cases we believe the changes required to application level codes, e.g., a program written with MPI processes that use accelerator based compute nodes, is reasonable, although the kernel level approach provides more generality and ease of implementation, but often less performance than work stealing approach.
doi:10.1145/1531743.1531769 dblp:conf/cf/BlagojevicIYCNR09 fatcat:7fyxlqxebff7zc34assd4mjkuy