A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2013; you can also visit the original URL.
The file type is
To achieve high performance on multi-cores, modern loop optimizers apply long sequences of transformations that produce complex loop structures. Downstream optimizations such as register tiling (unroll-and-jam plus scalar promotion) typically provide a significant performance improvement. Typical register tilers provide this performance improvement only when applied on simple loop structures. They often fail to operate on complex loop structures leaving a significant amount of performance ondoi:10.1145/1654059.1654105 dblp:conf/sc/RenganarayanaBDEO09 fatcat:wcvfqonpr5h6rcyvuixudh26uu