A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Benchmarking Quantized Neural Networks on FPGAs with FINN
[article]
2021
arXiv
pre-print
While training neural networks may require a powerful setup, deploying a network must be possible on low-power and low-resource hardware architectures. ...
The ever-growing cost of both training and inference for state-of-the-art neural networks has brought literature to look upon ways to cut off resources used with a minimal impact on accuracy. ...
This tool is used to assess the impact of quantization on 49 implementations of an MLP network trained on both MNIST and Fashion-MNIST. ...
arXiv:2102.01341v1
fatcat:da5kdburnnfxvbfvda4mtkqcii
ICARUS: A Lightweight Neural Plenoptic Rendering Architecture
[article]
2022
arXiv
pre-print
The practical deployment of Neural Radiance Field (NeRF) in rendering applications faces several challenges, with the most critical one being low rendering speed on even high-end graphic processing units ...
(PEU), a multi-layer perceptron (MLP) engine, and a volume rendering unit (VRU). ...
NPU is a type of a dedicated hardware architecture designed for neural networks, with each specific design has its own strategy of optimization, especially on data sharing [Chen et al. 2016] . ...
arXiv:2203.01414v1
fatcat:5wehzotun5bobikgxao2diryge
An electronic system for simulation of neural networks with a micro-second real time constraint
2001
AIP Conference Proceedings
The electronic implementation is with FPGA's, which can be optimized for a specific neural network because the number of processing elements can be modified. ...
Typically, time constants of the order of a few microseconds are required. In this paper, we present a new system, MAHARADJA, for evaluating MLP and RBF neural network paradigms in real time. ...
in a SIMD 1 fashion. ...
doi:10.1063/1.1405266
fatcat:uw5pccmzcnhvboyfjrzflgl7eu
A Task-level Pipelined Many-SIMD Augmented Reality Processor with Congestion-aware Network-on-Chip Scheduler
2015
IEEE Micro
To enable a real-time operation of the proposed augmented reality, task-level pipelined multicore architecture with DLP/TLP optimized SIMD processing elements is implemented. ...
In addition, the multicore employs a congestion-aware network-on-chip scheduler for 2D-mesh network-on-chip to support massive internal data transaction caused by task-level pipeline. ...
Also, an on-line congestion-aware scheduler (CAS) for 2D-mesh network-on-chip (NoC) architecture is implemented for low power consumption. ...
doi:10.1109/mm.2015.2
fatcat:kd5kg5ndxne3hksvlxzkrq2nne
A task-level pipelined many-SIMD augmented reality processor with congestion-aware network-on-chip scheduler
2014
2014 IEEE COOL Chips XVII
To enable a real-time operation of the proposed augmented reality, task-level pipelined multicore architecture with DLP/TLP optimized SIMD processing elements is implemented. ...
In addition, the multicore employs a congestion-aware network-on-chip scheduler for 2D-mesh network-on-chip to support massive internal data transaction caused by task-level pipeline. ...
Also, an on-line congestion-aware scheduler (CAS) for 2D-mesh network-on-chip (NoC) architecture is implemented for low power consumption. ...
doi:10.1109/coolchips.2014.6842959
dblp:conf/coolchips/KimPLKHBSCPY14
fatcat:e3ms3usrrfabdjo77lssrttaxy
Scaling Binarized Neural Networks on Reconfigurable Logic
2017
Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '17
Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. ...
Our implementation of this network achieves 14.8 teraoperations per second. We believe this is the fastest classification rate reported to date on this benchmark at this level of accuracy. ...
Scaling to Larger Networks A results summary is shown in Table 3 which also shows the accuracy achieved by the implemented networks on a number of benchmark datasets. ...
doi:10.1145/3029580.3029586
dblp:conf/hipeac/FraserUGBLJV17
fatcat:jliterfdmbbp3ao5yly4masuce
Scaling Binarized Neural Networks on Reconfigurable Logic
[article]
2017
arXiv
pre-print
Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost. ...
Our implementation of this network achieves 14.8 trillion operations per second. We believe this is the fastest classification rate reported to date on this benchmark at this level of accuracy. ...
neural network training on compute clusters. ...
arXiv:1701.03400v2
fatcat:lf52l3zre5dxndh6wd2xy3v4h4
Neural acceleration for GPU throughput processors
2015
Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48
We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores. ...
Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4× average speedup and a 2.8× average energy reduction within 10% quality loss margin across a diverse set ...
Acknowledgments This work was supported by a Qualcomm Innovation Fellowship, NSF award CCF#1553192, Semiconductor Research Corpo. contract #2014-EP-2577, and a gift from Google. ...
doi:10.1145/2830772.2830810
dblp:conf/micro/YazdanbakhshPSL15
fatcat:nlmzt5nn65b5zkgglu2veliqpu
Implementing Neural Networks Efficiently
[chapter]
2012
Lecture Notes in Computer Science
One should however not under-estimate the time spent in designing the right neural network for a given task, or even the amount of work put into feeding data to the neural network properly. ...
Designing the right network for a given task in a short amount of time requires a flexible development environment and a properly designed neural network toolbox. ...
For other neural network specific cases, such as convolutions, one must implement specialized routines for each architecture of choice. ...
doi:10.1007/978-3-642-35289-8_28
fatcat:pvgtmdm6mbgjpfpavqbujy47tu
NnSP: embedded neural networks stream processor
2005
48th Midwest Symposium on Circuits and Systems, 2005.
A neural network employed for mobile robot navigation control, is implemented on the realized SoPC hardware. The realizationspeedup achievements are presented here. ...
The architecture proposed in this paper is a parallel stream processor called Neural Networks Stream Processor or NnSP which can be programmed to realize different neural-network topologies and architectures ...
From the architecture design view, digital implementations of neural networks can be classified into three general categories: custom implementations [1] [2], systolic-based implementations, and SIMD/ ...
doi:10.1109/mwscas.2005.1594079
fatcat:cn676ptbl5ftnpzrxvfdls65dy
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
[article]
2018
arXiv
pre-print
Given a neural network description, the tool optimizes for given platforms, design targets and a specific precision. ...
Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS\,F1, demonstrating ...
Accelerators & Architectures A great deal of prior work on mapping neural networks to hardware exists for both FPGAs and ASICs. ...
arXiv:1809.04570v1
fatcat:4ua6ntawtnax3bxgaoradxz66e
The GRD chip: genetic reconfiguration of DSPs for neural network processing
1999
IEEE transactions on computers
Both the topology and the hidden layer node functions of a neural network mapped on the GRD chips are dynamically reconfigured using a genetic algorithm (GA). ...
The GRD chip is a building block for the configuration of a scalable neural network hardware system. ...
For example, a 19-inch rack implementation of 16 VME triple-height boards (nine GRD chips on a board) can realize the performance of 46 GCPS (Giga Connection Per Second) in MLP. ...
doi:10.1109/12.773799
fatcat:43tz4h2bpzhefcsxrhjrvpxsdu
Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression
[article]
2018
arXiv
pre-print
makes it challenging to implement them on power/area-constrained embedded platforms. ...
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which ...
A Multi-Layer Perceptron (MLP) architecture with two hidden layers was used for training MNIST. Model selection for best architecture was performed based on accuracy study on different architectures. ...
arXiv:1804.07370v1
fatcat:hirsopx7czbexffugk6a3iixmm
Classification Of Ecg Arrhythmias Using Discrete Wavelet Transform and Neural Networks
2012
International Journal of Computer Science Engineering and Applications
Discrete wavelet transform is used for processing ECG recordings, and extracting some features, and the Multi-Layer Perceptron (MLP) neural network performs the classification task. ...
Some recordings of the MIT-BIH arrhythmias database have been used for training and testing our neural network based classifier. ...
Classification Phase In the classification phase, we have used an MLP neural network. The best architecture of the MLP NN is usually obtained using a trial-and-error process [27] , [28] . ...
doi:10.5121/ijcsea.2012.2101
fatcat:qojtjuqdy5ed7kglfnj5yjwuhq
A Configurable Cloud-Scale DNN Processor for Real-Time AI
2018
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)
Index Terms-neural network hardware; accelerator architectures; field programmable gate arrays ...
Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models-aka "realtime AI". ...
The key aspects of the BW system and NPU are: • Architecture: The BW NPU implements a singlethreaded SIMD ISA comprised of matrix-vector and vector-vector operations, contrasting with most current accelerators ...
doi:10.1109/isca.2018.00012
dblp:conf/isca/FowersOPMLLAHAG18
fatcat:qalwazqx7jcctkndjrqmhszccq
« Previous
Showing results 1 — 15 out of 230 results