6,866 Hits in 5.9 sec

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
First, we provide an overview of the different interconnection methods on the DNN accelerator. Then, the interconnection methods on the non-ASIC DNN accelerator will be discussed.  ...  As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study.  ...  Her research interests include interconnection networks and neural network accelerators.  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

Session 15 Overview: Compute-in-Memory Processors for Deep Neural Networks

Jun Deguchi, Yongpan Liu, Yan Li
2021 2021 IEEE International Solid- State Circuits Conference (ISSCC)  
Compute-in-memory (CIM) processors for deep neural networks continue to expand their capabilities, and to scale to larger datasets and more complicated models.  ...  The final paper in the session applies the tensor-train method to decompose and compress neural networks so that they fit within on-chip memory.  ...  neural-network (NN) inference processor based on a 4×4 array of programmable cores combining precise mixed-signal capacitor-based in-memory-computing (IMC) with digital SIMD near-memory computing, interconnected  ... 
doi:10.1109/isscc42613.2021.9365855 fatcat:ngyloi6o7fba3j4rig5m6iy37m

2020 Index IEEE Journal on Emerging and Selected Topics in Circuits and Systems Vol. 10

2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
Han, J., +, JETCAS March 2020 52-61 Parallel processing An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators.  ...  ., +, JET- CAS March 2020 100-113 Multiprocessor interconnection networks An Overview of Efficient Interconnection Networks for Deep Neural Net- work Accelerators.  ... 
doi:10.1109/jetcas.2020.3043859 fatcat:xuvsy4hh6rdq5hj65t7mw7pefa

HOTI 2020 Commentary

2020 2020 IEEE Symposium on High-Performance Interconnects (HOTI)  
Modern DL frameworks like TensorFlow, PyTorch, and several others have emerged that offer ease of use and flexibility to train, and deploy various types of Deep Neural Networks (DNNs).  ...  In this tutorial, we will provide an overview of interesting trends in DNN design and how cuttingedge hardware architectures and high-performance interconnects are playing a key role in moving the field  ...  His research interests include parallel computer architecture, high performance networking, InfiniBand, network-based computing, exascale computing, programming models, GPUs and accelerators, high performance  ... 
doi:10.1109/hoti51249.2020.00012 fatcat:ptxu3fuk5vghflln7ezaitoeyq

Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems [article]

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, Krishnakumar Nair, Isabel Gao (+3 others)
2020 arXiv   pre-print
However, in this paper we focus on the deep learning recommendation models (DLRMs), which are responsible for more than 50% of the training demand in our data centers.  ...  To address it we design Zion - Facebook's next-generation large-memory training platform that consists of both CPUs and accelerators.  ...  Recommendation Model Neural network-based recommendation models, which are used to address personalization for different services, have become an important class of DL algorithms within Facebook.  ... 
arXiv:2003.09518v3 fatcat:bq5xo7jrovccrh5tkqr77otzqm

2021 Index IEEE Open Journal of Circuits and Systems Vol. 2

2021 IEEE Open Journal of Circuits and Systems  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  The primary entry includes the coauthors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination.  ...  Zan, Z., +, OJCAS 2021 627-632 Deep learning A Power Efficiency Enhancements of a Multi-Bit Accelerator for Memory Prohibitive Deep Neural Networks.  ... 
doi:10.1109/ojcas.2022.3142950 fatcat:fhkrikmks5cqrctslmgvoyisoq

Hardware Accelerator Design for Machine Learning [chapter]

Li Du, Yuan Du
2018 Machine Learning - Advanced Techniques and Emerging Applications  
kinds of machine learning algorithms such as a deep convolutional neural network.  ...  Finally, various application specific integrated circuit (ASIC) architecture is proposed to achieve the best energy efficiency at the cost of less reconfigurability which makes it suitable for special  ...  This chapter gives an overview of the hardware accelerator design, the various types of the ML acceleration, and the technique used in improving the hardware computation efficiency of ML computation. 2  ... 
doi:10.5772/intechopen.72845 fatcat:z6ias3vzibbtdpn2tbx5sli7ie

CNNLab: a Novel Parallel Framework for Neural Networks using GPU and FPGA-a Practical Study with Trade-off Analysis [article]

Maohua Zhu, Liu Liu, Chao Wang, Yuan Xie
2016 arXiv   pre-print
Designing and implementing efficient, provably correct parallel neural network processing is challenging.  ...  However, the diversity and large-scale data size have posed a significant challenge to construct a flexible and high-performance implementation of deep learning neural networks.  ...  [12] propose an efficient GPU implementation of the large-scale recurrent neural network and demonstrate the power of scaling up the recurrent neural network with GPUs.  ... 
arXiv:1606.06234v1 fatcat:en7acoahonb7beqrnxv553g46e

NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22nm FD-SOI [article]

Fabian Schuiki, Michael Schaffner, Luca Benini
2018 arXiv   pre-print
In this paper we revisit NTX (an efficient accelerator developed for training Deep Neural Networks at scale) as a generalized MAC and reduction streaming engine.  ...  1.4 Tflop/s for training large state-of-the-art networks with full floating-point precision.  ...  INTRODUCTION Specialized accelerators for parallel MAC intensive workloads are becoming essential platforms ranging from mobile SoCs to high-performance GPUs, due to the widespread diffusion of Deep Neural  ... 
arXiv:1812.00182v1 fatcat:eoh5l6bj7rhfpgq74jmq27j6cu

From DNNs to GANs: Review of efficient hardware architectures for deep learning [article]

Gaurab Bhattacharya
2021 arXiv   pre-print
In this review, we illustrate the recent developments in hardware for accelerating the efficient implementation of deep learning networks with enhanced performance.  ...  Recently, neural network and deep learning has been started to impact the present research paradigm significantly which consists of parameters in the order of millions, nonlinear function for activation  ...  STATE-OF-THE-ART HARDWARE ARCHITECTURES FOR FEED FORWARD DEEP NEURAL NETWORK Artificial neural network (ANN), feed forward neural network (FFNN), or deep neural network (DNN) is one of the most promising  ... 
arXiv:2107.00092v1 fatcat:i6kijx7pavdajeskn4lip7gnhe

A Survey on Silicon Photonics for Deep Learning [article]

Febin P Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha
2021 arXiv   pre-print
execute the deep neural network models.  ...  Many application-specific integrated circuit (ASIC) hardware accelerators for deep learning have garnered interest in recent years due to their improved performance and energy-efficiency over conventional  ...  Section 3 presents an overview of fundamental silicon photonic devices that are widely used in photonic neural networks and relevant for accelerating deep learning models.  ... 
arXiv:2101.01751v2 fatcat:jorj4q6tjjewxfkbbovnxvpdyi

XNORBIN: A 95 TOp/s/W hardware accelerator for binary convolutional neural networks

Andrawes Al Bahou, Geethan Karunaratne, Renzo Andri, Lukas Cavigelli, Luca Benini
2018 2018 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)  
Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0 TOp/s/MGE at 0.8 V.  ...  We present XNORBIN, a flexible accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse supporting even non-trivial network topologies with large feature map volumes  ...  We compare energy efficiency of XNORBIN to state-of-the-art CNN accelerators in Tbl. 3. To the best of our knowledge, this is the first hardware accelerator for binary neural networks.  ... 
doi:10.1109/coolchips.2018.8373076 dblp:conf/coolchips/BahouKACB18 fatcat:cnddwsys7bg45owi5wfdlw2u5y

Final Program

2021 2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)  
Quantum computing promises exponential speedups for an important class of problems.  ...  Abstract: The extraordinary market demand for large-scale machine learning  ...  In this talk, the pragmatic practice of our co-design effort for "Fugaku" and its performance will be presented as well as an overview of system software.  ... 
doi:10.1109/coolchips52128.2021.9410348 fatcat:rtgywenc4bd2lbixraustje6xa

Introduction to JSTQE Issue on Photonics for Deep Learning and Neural Computing

Paul R. Prucnal, Bhavin J. Shastri, Ingo Fischer, Daniel Brunner
2020 IEEE Journal of Selected Topics in Quantum Electronics  
neural networks-for deep learning acceleration using tunable microring resonators or semiconductor optical amplifiers; a Winograd-based convolutional neural networks; and noise and scalability analysis  ...  Digital Object Identifier 10.1109/JSTQE.2020.2965384 The purpose of this JSTQE Special Issue on Photonics for Deep Learning and Neural Computing is to serve as a comprehensive overview of the current status  ...  His current research includes neural networks in photonic systems, novel photonic integration techniques, and fundamental aspects of hardware-implemented neural networks.  ... 
doi:10.1109/jstqe.2020.2965384 fatcat:f6kubfhkube7hgxzygqlrizxa4

Embedded Intelligence on FPGA: Survey, Applications and Challenges

Kah Phooi Seng, Paik Jen Lee, Li Minn Ang
2021 Electronics  
There are four main classification and thematic descriptors which are reviewed and discussed in this paper for EI: (1) EI techniques including machine learning and neural networks, deep learning, expert  ...  This paper presents an overview and review of embedded intelligence on FPGA with a focus on applications, platforms and challenges.  ...  FPGA-based EI solutions are developing to become an ideal candidate in hardware accelerators for energy efficient neural network applications instead of solely relying on software solutions.  ... 
doi:10.3390/electronics10080895 fatcat:igqk3n2kp5f4bmt6ho2qa3baau
« Previous Showing results 1 — 15 out of 6,866 results