Power Efficient Tiny Yolo CNN using Reduced Hardware Resources based on Booth Multiplier and WALLACE Tree Adders

Fasih Ud Din Farrukh, Chun Zhang, Yancao Jiang, Zhonghan Zhang, Ziqiang Wang, Zhihua Wang, Hanjun Jiang
2020 IEEE Open Journal of Circuits and Systems  
Convolutional Neural Network (CNN) has attained high accuracy and it has been widely employed in image recognition tasks. In recent times, deep learning-based modern applications are evolving and it poses a challenge in research and development of hardware implementation. Therefore, hardware optimization for efficient accelerator design of CNN remains a challenging task. A key component of the accelerator design is a processing element (PE) that implements the convolution operation. To reduce
more » ... e amount of hardware resources and power consumption, this article provides a new processing element design as an alternate solution for hardware implementation. Modified BOOTH encoding (MBE) multiplier and WALLACE tree-based adders are proposed to replace bulky MAC units and typical adder tree respectively. The proposed CNN accelerator design is tested on Zynq-706 FPGA board which achieves a throughput of 87.03 GOP/s for Tiny-YOLO-v2 architecture. The proposed design allows to reduce hardware costs by 24.5% achieving a power efficiency of 61.64 GOP/s/W that outperforms the previous designs. INDEX TERMS Convolutional neural network, booth encoding multiplier, WALLACE tree adders, FPGA, adder tree, object detection. I. INTRODUCTION D EEP learning evolves from machine learning and it is quickly becoming an essential part of daily life. A deep convolutional neural network is a part of deep learning and it facilitates to resolve many complex image-related tasks [1]-[3]. It has been successfully applied in a wide range of applications that include classification, speech processing and recognition, and object detection [4]-[6]. Moreover, deep learning is also becoming a potential solution for many industrial applications. These applications include autonomous vehicles, smart robots and camera technologies, and surveillance [7]- [10] . GPUs, FPGAs, and ASICs are used to implement the CNN accelerator design. GPUs have the advantage of design flexibility, but are energy inefficient and usually require a long execution time. The ASICs consume less power than the GPUs, but the flexibility is sacrificed, and the implementation cycle is quite long in consideration of the chip fabrication. In comparison with GPUs and ASICs, FPGAs have a good trade-off in terms of design flexibility, the implementation cycle, and the power consumption. FPGAs can be reconfigured depending on the application requirement. The FPGA designs can also be easily converted to ASIC designs. In recent times, the benefits of FPGAs in energy-efficiency, reconfigurable architecture, and customizable features draw the attention of many researchers to put their focus on FPGA based accelerator This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 76 VOLUME 1, 2020
doi:10.1109/ojcas.2020.3007334 fatcat:6jvkfomk3zfnnlfenfrdxklchy