Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Marcelo Cintra, José F. Martínez, Josep Torrellas
2000 SIGARCH Computer Architecture News  
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs
more » ... ively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important nonanalyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly. cesses to a single word. In the table, PRE and SUC stand for predecessor and successor thread respectively, and © and © ¡ refer to two versions of the same word created by the predecessor and successor thread respectively.
doi:10.1145/342001.363382 fatcat:akzt3gzhhvabxmq6wqydav3v4i