Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Marcelo Cintra, José F. Martínez, Josep Torrellas
2000 Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00  
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems. In this paper, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs
more » ... ively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed the scheme in a hierarchical manner that largely abstracts away the internals of the node. We effectively utilize a speculative CMP as the building block for our scheme. Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important nonanalyzable scientific loops, we report average speedups of 4.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly. cesses to a single word. In the table, PRE and SUC stand for predecessor and successor thread respectively, and © and © ¡ refer to two versions of the same word created by the predecessor and successor thread respectively.
doi:10.1145/339647.363382 fatcat:ymase3zpnfhjbclxdatm3voq7m