### Fast computation of local correlation coefficients

Xiaobai Sun, Nikos P. Pitsianis, Paolo Bientinesi, Franklin T. Luk
2008 Advanced Signal Processing Algorithms, Architectures, and Implementations XVIII
This paper presents an acceleration method, using both algorithmic and architectural means, for fast calculation of local correlation coefficients , which is a basic image-based information processing step for template or pattern matching, image registration, motion or change detection and estimation, compensation of changes, or compression of representations, among other information processing objectives. For real-time applications, the complexity in arithmetic operations as well as in
more » ... ing and memory access latency had been a divisive issue between the so-called correction-based methods and the Fourier domain methods. In the presented method, the complexity in calculating local correlation coefficients is reduced via equivalent reformulation that leads to efficient array operations or enables the use of multi-dimensional fast Fourier transforms, without losing or sacrificing local and non-linear changes or characteristics. The computation time is further reduced by utilizing modern multi-core architectures, such as the Sony-Toshiba-IBM Cell processor, with high processing speed and low power consumption. 1. Data exchange, butterfly and twiddle operations. Do in parallel: Yi := Yi + Yi+1; i ∈ {0, 2, 4, 6}. Yi := (Yi−1 − Yi)di; i ∈ {1, 3, 5, 7}. Do in parallel: Yi := (Yi + Yi+2)wi; i ∈ {0, 1, 4, 5}. Yi := (Yi−2 − Yi)wi; i ∈ {2, 3, 6, 7}. Do in parallel: Yi := (Yi + Yi+4)Wi; i ∈ {0, 1, 2, 3}. Yi := (Yi−4 − Yi)Wi; i ∈ {4, 5, 6, 7}. 2. Local FFTs. Yi := FFT3(Yi), i = 0, . . . , 7. 3. Data permutation and write out. Pout = Pn 2 ,8 (P −1 out Y )'1 : n1, 1 : n2, i × n 3 8 +[1 : n 3 8 ]´:= Yi. i = 0, . . . , 7.