Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems
We have built a run-time compilation system that takes unmodified sequential binaries and improves their performance on off-the-shelf multiprocessors using dynamic vectorization and looplevel parallelization techniques. Our system, Azure, is purely software based and requires no specific hardware support for speculative thread execution, yet it is able to break even in most cases, i.e., the achieved speedup exceeds the cost of run-time monitoring and compilation, often by significant amounts.
... y to this remarkable performance is an off-line preprocessing step that extracts a mostly correct control flow graph (CFG) from the binary program ahead of time. This statically obtained CFG is incomplete in that it may be missing some edges corresponding to computed branches. We describe how such additional control flow edges are discovered and handled at run-time, so that an incomplete static analysis never leads to an incorrect optimization result. The availability of a mostly correct CFG enables us to statically partition a binary executable into single-entry multiple-exit regions and to identify potential parallelization candidates ahead of execution. Program regions that are not candidates for parallelization can thereby be excluded completely from run-time monitoring and dynamic recompilation. Azure's extremely low overhead is a direct consequence of this design.