Using processor affinity in loop scheduling on shared-memory multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on di erent processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly as possible, while minimizing the number of synchronization operations required. In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors:
... cation overhead caused by accesses to non-local data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a signi cant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms using ve representative k ernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butter y, a Sequent Symmetry, and a KSR-1, and show that the new algorithm o ers substantial performance improvements, up to a factor of 4 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot a ord to ignore the location of data, particularly in light of the increasing disparity b e t ween processor and memory speeds.