Dynamic hot data stream prefetching for general-purpose programs

Trishul M. Chilimbi, Martin Hirzel
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
Prefetching data ahead of use has the potential to tolerate the growing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisticated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose programs, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing,
more » ... mplementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for generalpurpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is deoptimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5-19% for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.
doi:10.1145/512529.512554 dblp:conf/pldi/ChilimbiH02 fatcat:l7ki2agdhnhhlpwl3uxpkzp2wi