202 Hits in 9.8 sec

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency [article]

Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini
2021 arXiv   pre-print
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks.  ...  networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting  ...  ACKNOWLEDGEMENT The authors would like to thank armasuisse Science & Technology for funding this research.  ... 
arXiv:2011.01713v2 fatcat:wcxxzgr5cfgflm5ix2thcviqlu

Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors [article]

Lukas Baischer, Matthias Wess, Nima TaheriNejad
2021 arXiv   pre-print
In particular, we focus on acceleration of the inference of convolutional neural networks (CNNs) used for image recognition tasks. Given that there exist many different hardware architectures.  ...  In this article an overview of existing neural network hardware accelerators and acceleration methods is given.  ...  However, creating a fully custom neural network hardware accelerator requires a long design time and an in-depth knowledge of chip design and neural networks.  ... 
arXiv:2104.09252v1 fatcat:625wtuskhff3lbswhwmj7decni

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision [article]

Guhong Nie
2022 arXiv   pre-print
Similar conclusions can be drawn for prototypical systolic-array-based AI accelerators, where our BNNs promise 2.8-7× fewer execution cycles than 8-bit and 2.1-2.7× fewer cycles than alternative BNN designs  ...  For binary neural networks (BNNs) to become the mainstream on-device computer vision algorithm, they must achieve a superior speed-vs-accuracy tradeoff than 8-bit quantization and establish a similar degree  ...  Array Design We inherit a systolic array design in Fig. 3 (a) for accelerating 8-bit convolution.  ... 
arXiv:2202.03716v1 fatcat:3567crqy5vhnzf3suxf4ikpcxq

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands [article]

Dibakar Gope, Jesse Beu, Matthew Mattina
2020 arXiv   pre-print
We demonstrate how a systolic array architecture designed for symmetric-operand-size instructions could be modified to support an asymmetric-operand-sized instruction.  ...  DNN hardware accelerators (e.g., systolic array microarchitecture in Google TPU, etc.) and offer similar improvement in matrix multiply performance seamlessly without violating the various implementation  ...  SUITABILITY TO HARDWARE ACCELERATORS FOR DEEP NEURAL NETWORKS This section shows an example of how processing circuitry designed for performing the MAC operations in state-of-the-art DNN hardware accelerators  ... 
arXiv:2008.00638v1 fatcat:yhs3m5ih4zdsfa7745weieccsa

SWIS – Shared Weight bIt Sparsity for Efficient Neural Network Acceleration [article]

Shurui Li, Wojciech Romaszkan, Alexander Graening, Puneet Gupta
2021 arXiv   pre-print
We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight  ...  Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware.  ...  For some convolutional layers, Ratio of DRAM weight to activation accesses (RD+WR) in different convolutional layers of ResNet-18 in a systolic array accelerator.  ... 
arXiv:2103.01308v2 fatcat:7m5pbhhkdzbq7pz7bugva6ijxy

A Survey on System-Level Design of Neural Network Accelerators

Kenshu Seto
2021 Journal of Integrated Circuits and Systems  
In this paper, we present a brief survey on the system-level optimizations used for convolutional neural network (CNN) inference accelerators.  ...  Optimizations for CNN models are briefly explained, followed by the recent trends and future directions of the CNN accelerator design.  ...  CONVOLUTIONAL NEURAL NETWORKS (CNNS) In this section, we briefly overview convolutional neural networks (CNNs) used in image recognition tasks. Fig. 1 [2] .  ... 
doi:10.29292/jics.v16i2.505 fatcat:ibbkeob42jepbguezlptws2qha

Paper by Titles

2021 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia)  
CMOS Process, Operating in the 15 GHz band frequency divider for DSRC Application Auto-Calibration of Multi-view RGB-D Cameras for Virtual Mirrors Neural Network Accelerator based on Systolic Array with  ...  of Object Detection CNN With Weight Quantization and Scale Factor Consolidation P 2 3 A B C D E F G H I L M N O P R S T V W Performance Evaluation of Systolic DCNN Accelerators Processing-in-memory logic  ... 
doi:10.1109/icce-asia53811.2021.9641936 fatcat:esfw7mjqnnavno6je6buyjknxy

An FPGA-Based Convolutional Neural Network Coprocessor

Changpei Qiu, Xin'an Wang, Tianxia Zhao, Qiuping Li, Bo Wang, Hu Wang, Wenqing Wu
2021 Wireless Communications and Mobile Computing  
In this paper, an FPGA-based convolutional neural network coprocessor is proposed.  ...  The proposed coprocessor implements the convolutional and pooling layers of the VGG16 neural network model, in which the activation value, weight value, and bias value are quantized using 16-bit fixed-point  ...  Coprocessor Architecture In this paper, we provide a coprocessor implementation for convolutional neural networks, which is aimed at accelerating the convolutional and pooling layers of convolutional neural  ... 
doi:10.1155/2021/3768724 fatcat:wk2eb2mroffkhg2gw75hh2a7vq

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, Tinoosh Mohsenin
2021 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc.  ...  To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and  ...  In this section, we look at accelerator designs for different neural networks on both MCU (Micro-controller Unit) and FPGA, review ways to improve their latency, and analyze different quantization approaches  ... 
doi:10.1109/jetcas.2021.3129415 fatcat:nknpy4eernaeljz2hpqafe7sja

Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes

Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini
2018 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)  
Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit.  ...  Index Terms-Hardware Accelerator, Binary Weights Neural Networks, IoT • output FM-level: The TPUs calculate a block of output FMs (i.e. K) (line 6).  ...  Recently, several methods to train neural networks to withstand extreme quantization have been proposed, yielding the notions of binary and ternary weight networks (BWNs, TWNs) and binarized neural networks  ... 
doi:10.1109/isvlsi.2018.00099 dblp:conf/isvlsi/AndriCRB18 fatcat:uwcfawdeynes7agc3c2f4vxg2m

DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks [article]

Mohammadreza Baharani, Ushma Sunil, Kaustubh Manohar, Steven Furgurson, Hamed Tabkhi
2020 arXiv   pre-print
Deep Separable Convolutional Neural Networks (DSCNNs) have become the emerging paradigm by offering modular networks with structural sparsity in order to achieve higher accuracy with relatively lower operations  ...  This paper introduces DeepDive, which is a fully-functional, vertical co-design framework, for power-efficient implementation of DSCNNs on edge FPGAs.  ...  [23] designed a novel 2D systolic array that localizes data shifting to between neighboring PEs.  ... 
arXiv:2007.09490v1 fatcat:n4o23g7rcjf33njjohv2vc26hm

NeuroMAX: A High Throughput, Multi-Threaded, Log-Based Accelerator for Convolutional Neural Networks [article]

Mahmood Azhar Qureshi, Arslan Munir
2020 arXiv   pre-print
Convolutional neural networks (CNNs) require high throughput hardware accelerators for real time applications owing to their huge computational cost.  ...  Most traditional CNN accelerators rely on single core, linear processing elements (PEs) in conjunction with 1D dataflows for accelerating convolution operations.  ...  [10] is the improved version of [7] with higher hardware utilization and throughput. [11] introduced the concept of logarithmic data representation for neural network accelerator designs.  ... 
arXiv:2007.09578v1 fatcat:kg456srsnfeglpn2qz4dknweeq

OverQ: Opportunistic Outlier Quantization for Neural Network Accelerators [article]

Ritchie Zhao, Jordan Dotzel, Zhanqiu Hu, Preslav Ivanov, Christopher De Sa, Zhiru Zhang
2021 arXiv   pre-print
Outliers in weights and activations pose a key challenge for fixed-point quantization of neural networks.  ...  Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.  ...  One of the Titan Xp GPUs used for this research was donated by the NVIDIA Corporation. We would also like to acknowledge Shiqi Wang for her contributions and discussions of the project.  ... 
arXiv:1910.06909v2 fatcat:exhbjpbnnjcbdewctupejylrve

Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators [article]

Jan Moritz Joseph, Ananda Samajdar, Lingjun Zhu, Rainer Leupers, Sung-Kyu Lim, Thilo Pionteck, Tushar Krishna
2021 arXiv   pre-print
The everlasting demand for higher computing power for deep neural networks (DNNs) drives the development of parallel computing architectures. 3D integration, in which chips are integrated and connected  ...  Therefore, we analyze dataflows, performance, area, power and temperature of such 3D-DNN-accelerators. Monolithic and TSV-based stacked 3D-ICs are compared against 2D-ICs.  ...  Almost all DNN accelerators are matrix multiplication machines, since computation of DNNs follows this linear algebra motif, e.g., convolutions in CNNs (Convolution Neural Networks) or LSTM/GRU layers  ... 
arXiv:2012.12563v3 fatcat:476m2ur245bffjze5pohtjki5y

A Survey of FPGA-Based Neural Network Accelerator [article]

Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, Huazhong Yang
2018 arXiv   pre-print
GPU platforms are the first choice for neural network process because of its high computation capacity and easy to use development frameworks.  ...  An investigation from software to hardware, from circuit level to system level is carried out to complete analysis of FPGA-based neural network inference accelerator design and serves as a guide to future  ...  PRELIMINARY Before discussing the system design for neural network acceleration, we rst introduce the basic concepts of neural networks and the typical structure of FPGA-based NN accelerator design.  ... 
arXiv:1712.08934v3 fatcat:vbrf3s27e5gdtcr7uzg3smbpli
« Previous Showing results 1 — 15 out of 202 results