Filters








1,149 Hits in 3.3 sec

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA [article]

Chao Wang, Qi Yu, Lei Gong, Xi Li, Yuan Xie, Xuehai Zhou
2016 arXiv   pre-print
Experimental results on the state-of-the-art Xilinx FPGA board demonstrate that the DLAU accelerator is able to achieve up to 36.1x speedup comparing to the Intel Core2 processors, with the power consumption  ...  In order to improve the performance as well to maintain the low power cost, in this paper we design DLAU, which is a scalable accelerator architecture for large-scale deep learning networks using FPGA  ...  To sum up, these studies focus on implementing a particular deep learning algorithm efficiently, but how to increase the size of the neural networks with scalable and flexible hardware architecture  ... 
arXiv:1605.06894v1 fatcat:p4t42aossbdi3dmkq37lnfpdke

A preliminary investigation of a neocortex model implementation on the Cray XD1

Kenneth L. Rice, Christopher N. Vutsinas, Tarek M. Taha
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
We propose techniques to accelerate the application on general purpose processors and on reconfigurable logic.  ...  We present implementations of our approach on a Cray XD1 and compare the performance potential of scaling the design utilizing reconfigurable logic based acceleration to a software only design.  ...  We would also like to thank the staff at Center for Computational Science at the Naval Research Laboratory for their help.  ... 
doi:10.1145/1362622.1362626 dblp:conf/sc/RiceVT07 fatcat:2pzhc7fwpbdqhoudu5mrpfxo5u

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, Xuehai Zhou
2016 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Experimental results on the state-of-the-art Xilinx FPGA board demonstrate that the DLAU accelerator is able to achieve up to 36.1x speedup comparing to the Intel Core2 processors, with the power consumption  ...  In order to improve the performance as well to maintain the low power cost, in this paper we design DLAU, which is a scalable accelerator architecture for large-scale deep learning networks using FPGA  ...  To sum up, these studies focus on implementing a particular deep learning algorithm efficiently, but how to increase the size of the neural networks with scalable and flexible hardware architecture has  ... 
doi:10.1109/tcad.2016.2587683 fatcat:phsfgkclyrcwjeg4ff67yxvrfa

Distributed Architectures [chapter]

2003 The ABCs of LDAP  
We propose techniques to accelerate the application on general purpose processors and on reconfigurable logic.  ...  We present implementations of our approach on a Cray XD1 and compare the performance potential of scaling the design utilizing reconfigurable logic based acceleration to a software only design.  ...  We would also like to thank the staff at Center for Computational Science at the Naval Research Laboratory for their help.  ... 
doi:10.1201/9780203492673.ch5 fatcat:sjgbyuzvazddvkbfv6m2pp56ea

Distributed architectures

1987 Microprocessing and Microprogramming  
We propose techniques to accelerate the application on general purpose processors and on reconfigurable logic.  ...  We present implementations of our approach on a Cray XD1 and compare the performance potential of scaling the design utilizing reconfigurable logic based acceleration to a software only design.  ...  We would also like to thank the staff at Center for Computational Science at the Naval Research Laboratory for their help.  ... 
doi:10.1016/0165-6074(87)90095-0 fatcat:4z3eyjgqwzfuxnecw56tqjwkkq

Emerging Hardware Techniques and EDA Methodologies for Neuromorphic Computing (Dagstuhl Seminar 19152)

Krishnendu Chakrabarty, Tsung-Yi Ho, Hai Li, Ulf Schlichtmann, Michael Wagner
2019 Dagstuhl Reports  
Neuromorphic computing systems, that refer to the computing architecture inspired by the working mechanism of human brains, have gained considerable attention.  ...  By imitating this structure, neuromorphic computing systems are anticipated to be superior to conventional  ...  We propose a complete design flow to achieve both fast deployment and high energy efficiency for accelerating neural networks on FPGA [FPGA 16/17].  ... 
doi:10.4230/dagrep.9.4.43 dblp:journals/dagstuhl-reports/ChakrabartyH0S19 fatcat:7fpavhm4gzgxnj2o23jm66sjiy

Accelerating neuromorphic vision algorithms for recognition

Ahmed Al Maashri, Michael Debole, Matthew Cotter, Nandhini Chandramoorthy, Yang Xiao, Vijaykrishnan Narayanan, Chaitali Chakrabarti
2012 Proceedings of the 49th Annual Design Automation Conference on - DAC '12  
These accelerators were validated on a multi-FPGA platform and significant performance enhancement and power efficiencies were demonstrated when compared to CMP and GPU platforms.  ...  Results demonstrate as much as 7.6X speedup and 12.8X more power-efficient performance when compared to those platforms.  ...  As a step towards exploring how the brain efficiently processes visual information, a brain-inspired feed-forward hierarchical model (HMAX) [2] has become a widely accepted abstract representation of  ... 
doi:10.1145/2228360.2228465 dblp:conf/dac/Al-MaashriDCCXNC12 fatcat:m7qdz2k225arnmkplthvoedcwu

Digital Implementation of a Spiking Convolutional Neural Network for Tumor Detection

2020 Informacije midem  
The suggested neural network is explored for digital implementation possibility and costs. Results of the hardware synthesis and digital implementation are presented on an FPGA.  ...  Accordingly, the structure of the proposed SCNN is implemented on a field-programmable gate array (FPGA) using fixed point arithmetic.  ...  Designing and efficient implementation of these structures in hardware provide us with the benefit of presenting a processing system based on the structure of brains.  ... 
doi:10.33180/infmidem2019.401 fatcat:4dvl2mctgna4nf2mn3ufsdlnee

The Human Brain Project and neuromorphic computing

Andrea Calimera, Enrico Macii, Massimo Poncino
2013 Functional Neurology  
new category of hardware (neuromorphic computing systems).  ...  Understanding how the brain manages billions of processing units connected via kilometers of fibers and trillions of synapses, while consuming a few tens of Watts could provide the key to a completely  ...  These accelerators are typically based on artificial neural networks, but some prototypes that use digital signal processors (DSPs) for fast signal processing have been proposed.  ... 
doi:10.11138/fneur/2013.28.3.191 pmid:24139655 pmcid:PMC3812737 fatcat:lf5taz3vs5bcxhy3jw35elgzae

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications

Mostafa Rahimiazghadi, Corey Lammie, Jason Kamranr Eshraghian, Melika Payvand, Elisa Donati, Bernabe Linares-Barranco, Giacomo Indiveri
2020 IEEE Transactions on Biomedical Circuits and Systems  
adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems.  ...  Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that different accelerators and neuromorphic processors introduce to  ...  For a comprehensive review of previous FPGA-based DNN accelerators, we refer the reader to [89] .  ... 
doi:10.1109/tbcas.2020.3036081 pmid:33156792 fatcat:rjwfjd7vmvglpk762mqeyiteqq

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs [article]

Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis
2021 arXiv   pre-print
In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems.  ...  One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN).  ...  The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at PDC and HPC2N and partially funded by the Swedish Research Council through grant agreement  ... 
arXiv:2106.05373v1 fatcat:zjy73e22wzborpptp3um6ynmem

2018 Index IEEE Transactions on Computers Vol. 67

2019 IEEE transactions on computers  
., TC Sept. 2018 1259-1272 Analysis  ...  Howe, J., þ, TC March 2018 322-334 STABLE: Stress-Aware Boolean Matching to Mitigate BTI-Induced SNM Reduction in SRAM-Based FPGAs.  ...  ., þ, TC Aug. 2018 1184-1192 Brain modeling Genetic Programming for Energy-Efficient and Energy-Scalable Approxi- mate Feature Computation in Embedded Inference Systems.  ... 
doi:10.1109/tc.2018.2882120 fatcat:j2j7yw42hnghjoik2ghvqab6ti

Zedwulf: Power-Performance Tradeoffs of a 32-Node Zynq SoC Cluster

Pradeep Moorthy, Nachiket Kapre
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
We use the ARM processor for handling the MPI stack while offloading compute-intensive calculations to the FPGA.  ...  In this paper, we prototype a 32-node cluster composed from these Zynq SoC chips to accelerate communication-bound sparse graphoriented applications such as neural network simulations.  ...  Other neural simulation accelerators based on VLSI systems [9] , GPUs [10] , and FPGAs [11] have also explored parallelism using alternative high-performance computing fabrics that are either harder  ... 
doi:10.1109/fccm.2015.37 dblp:conf/fccm/MoorthyK15 fatcat:7iapt5jzp5fmddhu3fn7oxs5ci

InSight: An FPGA-Based Neuromorphic Computing System for Deep Neural Networks

Taeyang Hong, Yongshin Kang, Jaeyong Chung
2020 Journal of Low Power Electronics and Applications  
)-based accelerator.  ...  We demonstrate an implementation of the neuromorphic computing system based on a field-programmable gate array that performs image classification on the hand-wirtten 0 to 9 digits MNIST dataset with 99.37%  ...  Our system based on the off-the-shelf chip is even comparable to TrueNorth that is based on a custom chip.  ... 
doi:10.3390/jlpea10040036 fatcat:qgl2htptxrfdpoyuufsnqm6obq

A Reconfigurable Computing Approach for Efficient and Scalable Parallel Graph Exploration

Brahim Betkaoui, Yu Wang, David B. Thomas, Wayne Luk
2012 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors  
To validate our methodology, we provide a detailed design description of the Breadth-First Search algorithm on an FPGA-based high performance computing system.  ...  Using graph data based on the power-law graphs found in real-word problems, we are able to achieve performance results that are superior to those of high performance multi-core systems in the recent literature  ...  In [15] , BFS has been employed in brain network analysis of very sparse brain network data.  ... 
doi:10.1109/asap.2012.30 dblp:conf/asap/BetkaouiWTL12 fatcat:masycdtex5g6jgo3drv74aswta
« Previous Showing results 1 — 15 out of 1,149 results