A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to exploit all forms of reuse available to minimize off-chip memory access while increasing utilization of available resources. The proposed design is composed of cores, each of which contains a one-dimensional array of processing elements. These cores canarXiv:1912.07284v2 fatcat:7yvo7y2bjvaj3akevfmie4yqva