PS-cache: an energy-efficient cache design for chip multiprocessors

Youngjoon Jo, Michael Goldfarb, Milind Kulkarni
2013 Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques  
Repeated tree traversals are ubiquitous in many domains such as scientific simulation, data mining and graphics. Modern commodity processors support SIMD instructions, and using these instructions to process multiple traversals at once has the potential to provide substantial performance improvements. Unfortunately these algorithms often feature highly diverging traversals which inhibit efficient SIMD utilization, to the point that other, less profitable sources of vectorization must be
more » ... d instead. Previous work has proposed traversal splicing, a locality transformation for tree traversals, which dynamically reorders traversals based on previous behavior, based on the insight that traversals which have behaved similarly so far are likely to behave similarly in the future. In this work, we cast this dynamic reordering as a scheduling for efficient SIMD execution, and show that it can dramatically improve the SIMD utilization of diverging traversals, close to ideal utilization. For five irregular tree traversal algorithms, our techniques are able to deliver speedups of 2.78 on average over baseline implementations. Furthermore our techniques can effectively SIMDize algorithms that prior, manual vectorization attempts could not.
doi:10.1109/pact.2013.6618832 dblp:conf/IEEEpact/JoGK13 fatcat:7ek5mqgqk5dixlt4wuczu36ocq