Code and data partitioning for fine-grain parallelism

Michael L. Chu, Scott A. Mahlke
2007 SIGPLAN notices  
Introduction The recent shift to multicore designs for mainstream processors offers the potential to improve the performance of current applications. However, converting this potential into reality is a great challenge. The programmer and/or compiler must parallelize applications to take advantage of multiple cores. Recently, a significant amount of work has focused on areas such as new programming models and ways to exploit data-level parallelism. These methods for coarse-grain parallelization
more » ... can be extremely powerful in extracting large amounts of parallel work and distributing them across the cores. However, there are still a significant number of singlethreaded applications and programs that simply do not exhibit the inherent parallelism for programmers to widely spread their execution across multiple cores. This paper focuses on an alternative compiler-directed method for program parallelization by exploiting fine-grain instructionlevel parallelism (ILP). Current research in interconnection networks have focused on multiple ways to increase the speed and bandwidth of communication between cores [6, 7]. Faster communication of data values between the cores can then allow for applications to take advantage of parallelization at the operation and data granularity between the cores. While coarse-grain techniques can parallelize large portions of execution, our fine-grain method can use an additional dimension to further increase performance and exploit the multiple underlying cores. The challenge for exploiting fine-grain parallelism is: given an application, identify the operations that should execute on each core. This decision must take into account the communication overhead of transferring register values between the cores as well as the layout of data values in the individual caches of each core. Poor decisions could lead to communication across the interconnection network delaying the execution of other operations, cache conflicts
doi:10.1145/1273444.1254798 fatcat:ergokj7amnghppgaj36vqfknji