A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Serving DNNs in Real Time at Datacenter Scale with Project Brainwave
2018
IEEE Micro
Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes ...
To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. ...
CONCLUSION Hastened by the escalating demand for deep learning, the march toward ubiquitous specialized hardware for AI is well underway. ...
doi:10.1109/mm.2018.022071131
fatcat:6cdzotc6bnb5pa2yw74n2o6xq4
FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism
[article]
2021
arXiv
pre-print
FIXAR proposes the adaptive array processing core composed of configurable processing elements to support both intra-layer parallelism and intra-batch parallelism for high-throughput inference and training ...
Starting from 32-bit fixed-point data, Quantization-Aware Training (QAT) reduces its data precision based on the range of activations and performs retraining to minimize the reward degradation. ...
Recently, deep reinforcement learning (DRL) that utilizes a deep neural network (DNN) for the action policy to train has been proposed [1] - [4] . ...
arXiv:2102.12103v1
fatcat:c3xastkq2vfz7ott24hrq5wbnq
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
[article]
2018
arXiv
pre-print
The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). ...
The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical ...
Acknowledgement This work is funded by the National Science Foundation Awards CNS-1650469, CCF-1733701, CNS-1704662, CCF-1657333, CNS-1739748, and CCF-1733834. ...
arXiv:1802.06402v1
fatcat:cxbnxjl5mne3pkrj65ln2ph6nm
Embedded Intelligence on FPGA: Survey, Applications and Challenges
2021
Electronics
There are four main classification and thematic descriptors which are reviewed and discussed in this paper for EI: (1) EI techniques including machine learning and neural networks, deep learning, expert ...
); and (3) scalability to accommodate different network sizes and topologies. ...
The batch level parallelism is handled by PE which, in parallel, is using GEMM kernel [33] . ...
doi:10.3390/electronics10080895
fatcat:igqk3n2kp5f4bmt6ho2qa3baau
2021 Index IEEE Transactions on Parallel and Distributed Systems Vol. 32
2022
IEEE Transactions on Parallel and Distributed Systems
., +, TPDS May 2021 1030-1043 Privacy-Preserving Computation Offloading for Parallel Deep Neural Networks Training. ...
Gupta,
N., +, TPDS March 2021 575-586
Privacy-Preserving Computation Offloading for Parallel Deep Neural Net-
works Training. ...
doi:10.1109/tpds.2021.3107121
fatcat:e7bh2xssazdrjcpgn64mqh4hb4
GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers
2019
IEICE transactions on information and systems
tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. ...
Since the training of the CNN is dominant, it is considerable. key words: machine learning, deep learning, pruning, FPGA ...
University Program (XUP), and the support by the NVIDIA Corporation. Reviewer's comments are improved the paper. ...
doi:10.1587/transinf.2018rcp0002
fatcat:55dvdmcw4zf2zeqrmg2tzm6j4e
Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors
[article]
2021
arXiv
pre-print
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks. ...
FPGA-based implementations are well-suited to show the effect of DNN optimization methods on accuracy and throughput. For this reason, the focus of this work is more on FPGA-based implementations. ...
Section 6 explains the parallelization strategies used by neural network hardware accelerators. ...
arXiv:2104.09252v1
fatcat:625wtuskhff3lbswhwmj7decni
The accelerator looks to enable deploying networks in such resource-bound settings by both exploiting efficient forms of parallelism inherent in convolutional layers and by exploiting the sparsification ...
In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as the ImageNet database. ...
Finally, in , we demonstrated the ability to reduce the complexity of neural networks and further proposed an FPGA-based framework that enabled efficiently translating a pre-defined network topology onto ...
doi:10.1145/3005448
fatcat:quxiy72jtrfipdpeup75mhiizm
NN2CAM: Automated Neural Network Mapping for Multi-Precision Edge Processing on FPGA-Based Cameras
[article]
2021
arXiv
pre-print
The framework automatically converts an arbitrary-sized and quantized trained network into an efficient streaming-processing IP block that is instantiated within a generic adapter block in the FPGA. ...
The record-breaking achievements of deep neural networks (DNNs) in image classification and detection tasks resulted in a surge of new computer vision applications during the past years. ...
CONCLUSION We presented NN2CAM, an end-to-end framework for automatically mapping trained quantized neural networks onto FPGA-based edge processing devices and show experimental results on a FPGA-based ...
arXiv:2106.12840v1
fatcat:mgbrgszixfdpvajtbkmmga6f2m
A Memristor based Unsupervised Neuromorphic System Towards Fast and Energy-Efficient GAN
[article]
2019
arXiv
pre-print
We also proposed an efficient data flow for optimal parallelism training and testing, depending on the computation correlations between different computing blocks. ...
In this work, we proposed a holistic solution for fast and energy-efficient GAN computation through a memristor-based neuromorphic system. ...
Normally, supervised learning is employed in the state-of-the-art applications, where a deep neural network is trained from labeled training data and desired outputs are obtained after going through an ...
arXiv:1806.01775v4
fatcat:pckbn7vgvbadbfcb2fbqju3sui
FPGA-accelerated machine learning inference as a service for particle physics computing
2019
Zenodo
A single FPGA service accessed by many CPUs achieves a throughput of 600-700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than ...
As examples, we retrain the ResNet50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet50 model with transfer learning for ...
variables (right) • Now advanced to particle-level deep neural networks (next slide) • (Can also do Higgs tagging, W/Z tagging, etc.) ...
doi:10.5281/zenodo.3598991
fatcat:biu2auouejfbjdjyueat6tnygq
Table of contents
2019
2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
Leong (The University of Sydney) Towards Efficient Deep Neural Network Training by FPGA-Based Batch-Level Parallelism RapidRoute: Fast Assembly of Communication Structures for FPGA Overlays 61 Leo Liu ...
PIR-DSP: An FPGA DSP Block Architecture for Multi-precision Deep Neural Networks 35 SeyedRamin Rasoulinezhad (The University of Sydney), Hao Zhou (Fudan University), Lingli Wang (Fudan University), and ...
Poster Session 2: Neural Networks and Vision ...
doi:10.1109/fccm.2019.00004
fatcat:qku57w2j2vfs3kluykjmqfbzya
Evolutionary Cell Aided Design for Neural Network Architectures
[article]
2019
arXiv
pre-print
In practice designing practical and efficient neural network architectures require significant effort and expertise. ...
By running various experiments of the fittest solutions for neural network and hardware searches, we demonstrate the full end-to-end capabilities of the ECAD framework. ...
To expedite the training process, these formatted samples are passed in batches to the neural network. ...
arXiv:1903.02130v3
fatcat:6qsrbufyvvejhlr4b4eyp3r4t4
How to Train Your Neural Network: A Comparative Evaluation
[article]
2021
arXiv
pre-print
The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. ...
This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. ...
It is this hierarchical organization of layers that lends deep neural networks the power to generate useful high level representations in the latter layers. ...
arXiv:2111.04949v1
fatcat:gfjiefx24jh3bhizu4j4t5slwa
Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing
2021
Electronics
The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. ...
The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. ...
, with Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN). ...
doi:10.3390/electronics10161912
fatcat:3ywb6inqzvbfxb2vjve6ffvmiq
« Previous
Showing results 1 — 15 out of 972 results