A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Automatic Vectorization of Interleaved Data Revisited
2015
ACM Transactions on Architecture and Code Optimization (TACO)
Automatically exploiting short vector instructions sets (SSE, AVX, NEON) is a critically important task for optimizing compilers. Vector instructions typically work best on data that is contiguous in memory, and operating on non-contiguous data requires additional work to gather and scatter the data. There are several varieties of non-contiguous access, including interleaved data access. An existing approach used by GCC generates extremely efficient code for loops with power-of-two interleaving
doi:10.1145/2838735
fatcat:vm3glcm6xvd2bin6gtymfpja2a