972 Hits in 4.9 sec

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Maleen Abeydeera, Logan Adams (+31 others)
2018 IEEE Micro  
Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes  ...  To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance.  ...  CONCLUSION Hastened by the escalating demand for deep learning, the march toward ubiquitous specialized hardware for AI is well underway.  ... 
doi:10.1109/mm.2018.022071131 fatcat:6cdzotc6bnb5pa2yw74n2o6xq4

FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism [article]

Je Yang, Seongmin Hong, Joo-Young Kim
2021 arXiv   pre-print
FIXAR proposes the adaptive array processing core composed of configurable processing elements to support both intra-layer parallelism and intra-batch parallelism for high-throughput inference and training  ...  Starting from 32-bit fixed-point data, Quantization-Aware Training (QAT) reduces its data precision based on the range of activations and performs retraining to minimize the reward degradation.  ...  Recently, deep reinforcement learning (DRL) that utilizes a deep neural network (DNN) for the action policy to train has been proposed [1] - [4] .  ... 
arXiv:2102.12103v1 fatcat:c3xastkq2vfz7ott24hrq5wbnq

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework [article]

Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan, Siyu Liao, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, Xue Lin
2018 arXiv   pre-print
The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs).  ...  The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical  ...  Acknowledgement This work is funded by the National Science Foundation Awards CNS-1650469, CCF-1733701, CNS-1704662, CCF-1657333, CNS-1739748, and CCF-1733834.  ... 
arXiv:1802.06402v1 fatcat:cxbnxjl5mne3pkrj65ln2ph6nm

Embedded Intelligence on FPGA: Survey, Applications and Challenges

Kah Phooi Seng, Paik Jen Lee, Li Minn Ang
2021 Electronics  
There are four main classification and thematic descriptors which are reviewed and discussed in this paper for EI: (1) EI techniques including machine learning and neural networks, deep learning, expert  ...  ); and (3) scalability to accommodate different network sizes and topologies.  ...  The batch level parallelism is handled by PE which, in parallel, is using GEMM kernel [33] .  ... 
doi:10.3390/electronics10080895 fatcat:igqk3n2kp5f4bmt6ho2qa3baau

2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32

2022 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS May 2021 1030-1043 Privacy-Preserving Computation Offloading for Parallel Deep Neural Networks Training.  ...  Gupta, N., +, TPDS March 2021 575-586 Privacy-Preserving Computation Offloading for Parallel Deep Neural Net- works Training.  ... 
doi:10.1109/tpds.2021.3107121 fatcat:e7bh2xssazdrjcpgn64mqh4hb4

GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers

Hiroki NAKAHARA, Haruyoshi YONEKAWA, Tomoya FUJII, Masayuki SHIMODA, Shimpei SATO
2019 IEICE transactions on information and systems  
tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA.  ...  Since the training of the CNN is dominant, it is considerable. key words: machine learning, deep learning, pruning, FPGA  ...  University Program (XUP), and the support by the NVIDIA Corporation. Reviewer's comments are improved the paper.  ... 
doi:10.1587/transinf.2018rcp0002 fatcat:55dvdmcw4zf2zeqrmg2tzm6j4e

Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors [article]

Lukas Baischer, Matthias Wess, Nima TaheriNejad
2021 arXiv   pre-print
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks.  ...  FPGA-based implementations are well-suited to show the effect of DNN optimization methods on accuracy and throughput. For this reason, the focus of this work is more on FPGA-based implementations.  ...  Section 6 explains the parallelization strategies used by neural network hardware accelerators.  ... 
arXiv:2104.09252v1 fatcat:625wtuskhff3lbswhwmj7decni


Adam Page, Ali Jafari, Colin Shea, Tinoosh Mohsenin
2017 ACM Journal on Emerging Technologies in Computing Systems  
The accelerator looks to enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification  ...  In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database.  ...  Finally, in , we demonstrated the ability to reduce the complexity of neural networks and further proposed an FPGA-based framework that enabled efficiently translating a pre-defined network topology onto  ... 
doi:10.1145/3005448 fatcat:quxiy72jtrfipdpeup75mhiizm

NN2CAM: Automated Neural Network Mapping for Multi-Precision Edge Processing on FPGA-Based Cameras [article]

Petar Jokic, Stephane Emery, Luca Benini
2021 arXiv   pre-print
The framework automatically converts an arbitrary-sized and quantized trained network into an efficient streaming-processing IP block that is instantiated within a generic adapter block in the FPGA.  ...  The record-breaking achievements of deep neural networks (DNNs) in image classification and detection tasks resulted in a surge of new computer vision applications during the past years.  ...  CONCLUSION We presented NN2CAM, an end-to-end framework for automatically mapping trained quantized neural networks onto FPGA-based edge processing devices and show experimental results on a FPGA-based  ... 
arXiv:2106.12840v1 fatcat:mgbrgszixfdpvajtbkmmga6f2m

A Memristor based Unsupervised Neuromorphic System Towards Fast and Energy-Efficient GAN [article]

F. Liu, C. Liu, F.Bi
2019 arXiv   pre-print
We also proposed an efficient data flow for optimal parallelism training and testing, depending on the computation correlations between different computing blocks.  ...  In this work, we proposed a holistic solution for fast and energy-efficient GAN computation through a memristor-based neuromorphic system.  ...  Normally, supervised learning is employed in the state-of-the-art applications, where a deep neural network is trained from labeled training data and desired outputs are obtained after going through an  ... 
arXiv:1806.01775v4 fatcat:pckbn7vgvbadbfcb2fbqju3sui

FPGA-accelerated machine learning inference as a service for particle physics computing

Nhan Viet Tran, Kevin Pedro
2019 Zenodo  
A single FPGA service accessed by many CPUs achieves a throughput of 600-700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than  ...  As examples, we retrain the ResNet50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet50 model with transfer learning for  ...  variables (right) • Now advanced to particle-level deep neural networks (next slide) • (Can also do Higgs tagging, W/Z tagging, etc.)  ... 
doi:10.5281/zenodo.3598991 fatcat:biu2auouejfbjdjyueat6tnygq

Table of contents

2019 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)  
Leong (The University of Sydney) Towards Efficient Deep Neural Network Training by FPGA-Based Batch-Level Parallelism RapidRoute: Fast Assembly of Communication Structures for FPGA Overlays 61 Leo Liu  ...  PIR-DSP: An FPGA DSP Block Architecture for Multi-precision Deep Neural Networks 35 SeyedRamin Rasoulinezhad (The University of Sydney), Hao Zhou (Fudan University), Lingli Wang (Fudan University), and  ...  Poster Session 2: Neural Networks and Vision  ... 
doi:10.1109/fccm.2019.00004 fatcat:qku57w2j2vfs3kluykjmqfbzya

Evolutionary Cell Aided Design for Neural Network Architectures [article]

Philip Colangelo, Oren Segal, Alexander Speicher, Martin Margala
2019 arXiv   pre-print
In practice designing practical and efficient neural network architectures require significant effort and expertise.  ...  By running various experiments of the fittest solutions for neural network and hardware searches, we demonstrate the full end-to-end capabilities of the ECAD framework.  ...  To expedite the training process, these formatted samples are passed in batches to the neural network.  ... 
arXiv:1903.02130v3 fatcat:6qsrbufyvvejhlr4b4eyp3r4t4

How to Train Your Neural Network: A Comparative Evaluation [article]

Shu-Huai Lin, Daniel Nichols, Siddharth Singh, Abhinav Bhatele
2021 arXiv   pre-print
The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks.  ...  This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators.  ...  It is this hierarchical organization of layers that lends deep neural networks the power to generate useful high level representations in the latter layers.  ... 
arXiv:2111.04949v1 fatcat:gfjiefx24jh3bhizu4j4t5slwa

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

Georgios Flamis, Stavros Kalapothas, Paris Kitsos
2021 Electronics  
The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network.  ...  The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used.  ...  , with Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN).  ... 
doi:10.3390/electronics10161912 fatcat:3ywb6inqzvbfxb2vjve6ffvmiq
« Previous Showing results 1 — 15 out of 972 results