Filters








230 Hits in 2.4 sec

Benchmarking Quantized Neural Networks on FPGAs with FINN [article]

Quentin Ducasse, Pascal Cotret, Loïc Lagadec, Robert Stewart
2021 arXiv   pre-print
While training neural networks may require a powerful setup, deploying a network must be possible on low-power and low-resource hardware architectures.  ...  The ever-growing cost of both training and inference for state-of-the-art neural networks has brought literature to look upon ways to cut off resources used with a minimal impact on accuracy.  ...  This tool is used to assess the impact of quantization on 49 implementations of an MLP network trained on both MNIST and Fashion-MNIST.  ... 
arXiv:2102.01341v1 fatcat:da5kdburnnfxvbfvda4mtkqcii

ICARUS: A Lightweight Neural Plenoptic Rendering Architecture [article]

Chaolin Rao, Huangjie Yu, Haochuan Wan, Jindong Zhou, Yueyang Zheng, Yu Ma, Anpei Chen, Minye Wu, Binzhe Yuan, Pingqiang Zhou, Xin Lou, Jingyi Yu
2022 arXiv   pre-print
The practical deployment of Neural Radiance Field (NeRF) in rendering applications faces several challenges, with the most critical one being low rendering speed on even high-end graphic processing units  ...  (PEU), a multi-layer perceptron (MLP) engine, and a volume rendering unit (VRU).  ...  NPU is a type of a dedicated hardware architecture designed for neural networks, with each specific design has its own strategy of optimization, especially on data sharing [Chen et al. 2016] .  ... 
arXiv:2203.01414v1 fatcat:5wehzotun5bobikgxao2diryge

An electronic system for simulation of neural networks with a micro-second real time constraint

Arsenia Chorti
2001 AIP Conference Proceedings  
The electronic implementation is with FPGA's, which can be optimized for a specific neural network because the number of processing elements can be modified.  ...  Typically, time constants of the order of a few microseconds are required. In this paper, we present a new system, MAHARADJA, for evaluating MLP and RBF neural network paradigms in real time.  ...  in a SIMD 1 fashion.  ... 
doi:10.1063/1.1405266 fatcat:uw5pccmzcnhvboyfjrzflgl7eu

A Task-level Pipelined Many-SIMD Augmented Reality Processor with Congestion-aware Network-on-Chip Scheduler

Gyeonghoon Kim, Donghyun Kim, Seongwook Park, Youchang Kim, Kyuho Lee, Injoon Hong, Kyeongryeol Bong, Hoi-Jun Yoo
2015 IEEE Micro  
To enable a real-time operation of the proposed augmented reality, task-level pipelined multicore architecture with DLP/TLP optimized SIMD processing elements is implemented.  ...  In addition, the multicore employs a congestion-aware network-on-chip scheduler for 2D-mesh network-on-chip to support massive internal data transaction caused by task-level pipeline.  ...  Also, an on-line congestion-aware scheduler (CAS) for 2D-mesh network-on-chip (NoC) architecture is implemented for low power consumption.  ... 
doi:10.1109/mm.2015.2 fatcat:kd5kg5ndxne3hksvlxzkrq2nne

A task-level pipelined many-SIMD augmented reality processor with congestion-aware network-on-chip scheduler

Gyeonghoon Kim, Seongwook Park, Kyuho Lee, Youchang Kim, Injoon Hong, Kyeongryeol Bong, Dongjoo Shin, Sungpill Choi, Junyoung Park, Hoi-Jun Yoo
2014 2014 IEEE COOL Chips XVII  
To enable a real-time operation of the proposed augmented reality, task-level pipelined multicore architecture with DLP/TLP optimized SIMD processing elements is implemented.  ...  In addition, the multicore employs a congestion-aware network-on-chip scheduler for 2D-mesh network-on-chip to support massive internal data transaction caused by task-level pipeline.  ...  Also, an on-line congestion-aware scheduler (CAS) for 2D-mesh network-on-chip (NoC) architecture is implemented for low power consumption.  ... 
doi:10.1109/coolchips.2014.6842959 dblp:conf/coolchips/KimPLKHBSCPY14 fatcat:e3ms3usrrfabdjo77lssrttaxy

Scaling Binarized Neural Networks on Reconfigurable Logic

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
2017 Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '17  
Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost.  ...  Our implementation of this network achieves 14.8 teraoperations per second. We believe this is the fastest classification rate reported to date on this benchmark at this level of accuracy.  ...  Scaling to Larger Networks A results summary is shown in Table 3 which also shows the accuracy achieved by the implemented networks on a number of benchmark datasets.  ... 
doi:10.1145/3029580.3029586 dblp:conf/hipeac/FraserUGBLJV17 fatcat:jliterfdmbbp3ao5yly4masuce

Scaling Binarized Neural Networks on Reconfigurable Logic [article]

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
2017 arXiv   pre-print
Binarized neural networks (BNNs) are gaining interest in the deep learning community due to their significantly lower computational and memory cost.  ...  Our implementation of this network achieves 14.8 trillion operations per second. We believe this is the fastest classification rate reported to date on this benchmark at this level of accuracy.  ...  neural network training on compute clusters.  ... 
arXiv:1701.03400v2 fatcat:lf52l3zre5dxndh6wd2xy3v4h4

Neural acceleration for GPU throughput processors

Amir Yazdanbakhsh, Jongse Park, Hardik Sharma, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh
2015 Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48  
We introduce a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores.  ...  Compared to the baseline GPU architecture, cycle-accurate simulation results for NGPU show a 2.4× average speedup and a 2.8× average energy reduction within 10% quality loss margin across a diverse set  ...  Acknowledgments This work was supported by a Qualcomm Innovation Fellowship, NSF award CCF#1553192, Semiconductor Research Corpo. contract #2014-EP-2577, and a gift from Google.  ... 
doi:10.1145/2830772.2830810 dblp:conf/micro/YazdanbakhshPSL15 fatcat:nlmzt5nn65b5zkgglu2veliqpu

Implementing Neural Networks Efficiently [chapter]

Ronan Collobert, Koray Kavukcuoglu, Clément Farabet
2012 Lecture Notes in Computer Science  
One should however not under-estimate the time spent in designing the right neural network for a given task, or even the amount of work put into feeding data to the neural network properly.  ...  Designing the right network for a given task in a short amount of time requires a flexible development environment and a properly designed neural network toolbox.  ...  For other neural network specific cases, such as convolutions, one must implement specialized routines for each architecture of choice.  ... 
doi:10.1007/978-3-642-35289-8_28 fatcat:pvgtmdm6mbgjpfpavqbujy47tu

NnSP: embedded neural networks stream processor

H. Esmaeilzadeh, F. Farzan, N. Shahidi, S.M. Fakhraie, C. Lucas, M. Tehranipoor
2005 48th Midwest Symposium on Circuits and Systems, 2005.  
A neural network employed for mobile robot navigation control, is implemented on the realized SoPC hardware. The realizationspeedup achievements are presented here.  ...  The architecture proposed in this paper is a parallel stream processor called Neural Networks Stream Processor or NnSP which can be programmed to realize different neural-network topologies and architectures  ...  From the architecture design view, digital implementations of neural networks can be classified into three general categories: custom implementations [1] [2], systolic-based implementations, and SIMD/  ... 
doi:10.1109/mwscas.2005.1594079 fatcat:cn676ptbl5ftnpzrxvfdls65dy

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks [article]

Michaela Blott, Thomas Preusser, Nicholas Fraser, Giulio Gambardella, Kenneth O'Brien, Yaman Umuroglu
2018 arXiv   pre-print
Given a neural network description, the tool optimizes for given platforms, design targets and a specific precision.  ...  Finally, we evaluate a selection of reduced precision neural networks ranging from CIFAR-10 classifiers to YOLO-based object detection on a range of platforms including PYNQ and AWS\,F1, demonstrating  ...  Accelerators & Architectures A great deal of prior work on mapping neural networks to hardware exists for both FPGAs and ASICs.  ... 
arXiv:1809.04570v1 fatcat:4ua6ntawtnax3bxgaoradxz66e

The GRD chip: genetic reconfiguration of DSPs for neural network processing

M. Murakawa, S. Yoshizawa, I. Kajitani, X. Yao, N. Kajihara, M. Iwata, T. Higuchi
1999 IEEE transactions on computers  
Both the topology and the hidden layer node functions of a neural network mapped on the GRD chips are dynamically reconfigured using a genetic algorithm (GA).  ...  The GRD chip is a building block for the configuration of a scalable neural network hardware system.  ...  For example, a 19-inch rack implementation of 16 VME triple-height boards (nine GRD chips on a board) can realize the performance of 46 GCPS (Giga Connection Per Second) in MLP.  ... 
doi:10.1109/12.773799 fatcat:43tz4h2bpzhefcsxrhjrvpxsdu

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression [article]

Shihui Yin, Gaurav Srivastava, Shreyas K. Venkataramanaiah, Chaitali Chakrabarti, Visar Berisha, Jae-sun Seo
2018 arXiv   pre-print
makes it challenging to implement them on power/area-constrained embedded platforms.  ...  Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which  ...  A Multi-Layer Perceptron (MLP) architecture with two hidden layers was used for training MNIST. Model selection for best architecture was performed based on accuracy study on different architectures.  ... 
arXiv:1804.07370v1 fatcat:hirsopx7czbexffugk6a3iixmm

Classification Of Ecg Arrhythmias Using Discrete Wavelet Transform and Neural Networks

Maedeh Kiani Sarkaleh
2012 International Journal of Computer Science Engineering and Applications  
Discrete wavelet transform is used for processing ECG recordings, and extracting some features, and the Multi-Layer Perceptron (MLP) neural network performs the classification task.  ...  Some recordings of the MIT-BIH arrhythmias database have been used for training and testing our neural network based classifier.  ...  Classification Phase In the classification phase, we have used an MLP neural network. The best architecture of the MLP NN is usually obtained using a trial-and-error process [27] , [28] .  ... 
doi:10.5121/ijcsea.2012.2101 fatcat:qojtjuqdy5ed7kglfnj5yjwuhq

A Configurable Cloud-Scale DNN Processor for Real-Time AI

Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel (+8 others)
2018 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)  
Index Terms-neural network hardware; accelerator architectures; field programmable gate arrays  ...  Interactive AI-powered services require low-latency evaluation of deep neural network (DNN) models-aka "realtime AI".  ...  The key aspects of the BW system and NPU are: • Architecture: The BW NPU implements a singlethreaded SIMD ISA comprised of matrix-vector and vector-vector operations, contrasting with most current accelerators  ... 
doi:10.1109/isca.2018.00012 dblp:conf/isca/FowersOPMLLAHAG18 fatcat:qalwazqx7jcctkndjrqmhszccq
« Previous Showing results 1 — 15 out of 230 results