A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit <a rel="external noopener" href="https://mdpi-res.com/d_attachment/electronics/electronics-10-02859/article_deploy/electronics-10-02859-v2.pdf?version=1637569291">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="MDPI AG">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ikdpfme5h5egvnwtvvtjrnntyy" style="color: black;">Electronics</a>
Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics10222859">doi:10.3390/electronics10222859</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ie3kqrb5tjhg3cnu7o6bpmlsfm">fatcat:ie3kqrb5tjhg3cnu7o6bpmlsfm</a> </span>
more »... vestigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220503173203/https://mdpi-res.com/d_attachment/electronics/electronics-10-02859/article_deploy/electronics-10-02859-v2.pdf?version=1637569291" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fe/75/fe756c8ba806115dbd6f4ba2704a1d740cc166d6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics10222859"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a>