A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Enabling Efficient Fast Convolution Algorithms on GPUs via MegaKernels
2020
IEEE transactions on computers
Modern Convolutional Neural Networks (CNNs) require a massive amount of convolution operations. To address the overwhelming computation problem, Winograd and FFT fast algorithms have been used as effective approaches to reduce the number of multiplications. Inputs and filters are transformed into special domains then perform element-wise multiplication, which can be transformed into batched GEMM operation. Different stages of computation contain multiple tasks with different computation and
doi:10.1109/tc.2020.2973144
fatcat:quv5yqwzxrcnjorf6alqxx73eu