A super-efficient adaptable bit-reversal algorithm for multithreaded architectures

Anne C. Elster, Jan C. Meyer
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Fast bit-reversal algorithms have been of strong interest for many decades, especially after Cooley and Tukey introduced their FFT implementation in 1965. Many recent algorithms, including FFTW try to avoid the bit-reversal all together by doing in-place algorithms within their FFTs. We therefore motivate our work by showing that for FFTs of up to 65.536 points, a minimally tuned Cooley-Tukey FFT in C using our bit-reversal algorithm performs comparable or better than the default FFTW
more » ... In this paper, we present an extremely fast linear bitreversal adapted for modern multithreaded architectures. Our bit-reversal algorithm takes advantage of recursive calls combined with the fact that it only generates pairs of indices for which the corresponding elements need to be exchanges, thereby avoiding any explicit tests. In addition we have implemented an adaptive approach which explores the trade-off between compile time and run-time work load. By generating look-up tables at compile time, our algorithm becomes even faster at run-time. Our results also show that by using more than one thread on tightly coupled architectures, further speed-up can be achieved. Elster's Linear Bit Reversal Algorithm Since Elster's algorithm is nice and simple, yet seems to be reinvented several times -even in recent years -less efficiently since its original publication, we include it here.
doi:10.1109/ipdps.2009.5161105 dblp:conf/ipps/ElsterM09 fatcat:c6i5yeoebfe2zp2shgcxbsqx6q