Post-pass binary adaptation for software-based speculative precomputation

Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch threads. Software-based speculative precomputation (SSP) is one such technique, proposed for multithreaded Itanium models. SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on
more » ... therwise idle hardware thread contexts at run time. This paper presents a post-pass compilation tool for generating SSPenhanced binaries. The tool is able to: (1) analyze a singlethreaded application to generate prefetch threads; (2) identify and embed trigger points in the original binary; and (3) produce a new binary that has the prefetch threads attached. The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread. Our results indicate that for a set of pointer-intensive benchmarks, the prefetching performed by the speculative threads achieves an average of 87% speedup on an in-order processor and 5% speedup on an out-oforder processor. Various forms of such thread-based prefetching have been proposed recently. Examples include Collins et al.'s speculative precomputation [7], Luk's software controlled pre-execution [21], Roth and Sohi's data driven multithreading [25], and Zilles and Sohi's speculative slices [34] . These studies demonstrated the performance potential of thread-based prefetching by assuming the availability of hardware and/or compiler support. In this paper, we introduce an automated tool for transforming application code in order to attach prefetch threads in the binary. The aim of this paper is to demonstrate the feasibility of automatically generating binaries for thread-based prefetching and the effectiveness of the resulting binaries. To our knowledge, this work is the first to automate the entire process of extracting dependent instructions leading to target operations, identifying proper spawning points and managing inter-thread communication to ensure timely pre-execution. Our tool is post-pass because it does not modify the normal compilation steps, but rather is invoked after the compilation process. The tool is based on the speculative precomputation (SP) paradigm for future Itanium TM processors [16] . SP utilizes hardware thread contexts to execute precomputation slices (pslices), which consist of instructions that compute the memory addresses for prefetching [7] . Speculative threads can be spawned by one of two events: a basic trigger, which occurs when a designated trigger instruction in the non-speculative thread is retired, or a chaining trigger, by which one speculative thread explicitly spawns another. Collins et al. demonstrated that longrange prefetching using chaining triggers is the key to high performance via speculative precomputation [7] . As a proof of concept, they manually find the chaining triggers in the binary. Collins et al. later proposed dynamic speculative precomputation,
doi:10.1145/512541.512544 fatcat:nshqi5hd4rh2jbrrzpodutdw5y