A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is application/pdf
.
Compact multi-dimensional kernel extraction for register tiling
2009
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
To achieve high performance on multi-cores, modern loop optimizers apply long sequences of transformations that produce complex loop structures. Downstream optimizations such as register tiling (unroll-and-jam plus scalar promotion) typically provide a significant performance improvement. Typical register tilers provide this performance improvement only when applied on simple loop structures. They often fail to operate on complex loop structures leaving a significant amount of performance on
doi:10.1145/1654059.1654105
dblp:conf/sc/RenganarayanaBDEO09
fatcat:wcvfqonpr5h6rcyvuixudh26uu