Dynamic parallelization and mapping of binary executables on hierarchical platforms

Efe Yardimci, Michael Franz
2006 Proceedings of the 3rd conference on Computing frontiers - CF '06  
As performance improvements are being increasingly sought via coarse-grained parallelism, established expectations of continued sequential performance increases are not being met. Current trends in computing point toward platforms seeking performance improvements through various degrees of parallelism, with coarse-grained parallelism features becoming commonplace in even entry-level systems. Yet the broad variety of multiprocessor configurations that will be available that differ in the number
more » ... f processing elements will make it difficult to statically create a single parallel version of a program that performs well on the whole range of such hardware. As a result, there will soon be a vast number of multiprocessor systems that are significantly under-utilized for lack of software that harnesses their power effectively. This problem is exacerbated by the growing inventory of legacy programs in binary executable form with possibly unreachable source code. We present a system that improves the performance of optimized sequential binaries through dynamic recompilation. Leveraging observations made at runtime, a thin software layer recompiles executing code compiled for a uniprocessor and generates parallelized and/or vectorized code segments that exploit available parallel resources. Among the techniques employed are control speculation, loop distribution across several threads, and automatic parallelization of recursive routines. Our solution is entirely software-based and can be ported to existing hardware platforms that have parallel processing capabilities. Our performance results are obtained on real hardware without using simulation. In preliminary benchmarks on only modestly parallel (2-way) hardware, our system already provides speedups of up to 40% on SpecCPU benchmarks, and near-optimal speedups on more obviously parallelizable benchmarks.
doi:10.1145/1128022.1128040 dblp:conf/cf/YardimciF06 fatcat:l4p6ckzpefhjvae7bx6boeiwje