Filters








68 Hits in 2.0 sec

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges [article]

Chao Wang, Wenqi Lou, Lei Gong, Lihui Jin, Luchao Tan, Yahui Hu, Xi Li, Xuehai Zhou
2017 arXiv   pre-print
In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers.  ...  Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures.  ...  It exploits the potential parallelism of recurrent neural networks and proposes a fine-grained two-stage pipeline implementation.  ... 
arXiv:1712.04771v1 fatcat:3lxv45qb4zaqpagtn3eghrmroe

Stream-Dataflow Acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam
2017 SIGARCH Computer Architecture News  
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average.  ...  Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg.  ...  We want to thank Preyas Shah for setting up the automated Synopsys toolchain for area-power analysis.  ... 
doi:10.1145/3140659.3080255 fatcat:g5spj35pyvh7jlr6i3qr5ertlq

Stream-Dataflow Acceleration

Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, Karthikeyan Sankaralingam
2017 Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17  
Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2× power overhead on average.  ...  Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg.  ...  We want to thank Preyas Shah for setting up the automated Synopsys toolchain for area-power analysis.  ... 
doi:10.1145/3079856.3080255 dblp:conf/isca/NowatzkiGAS17 fatcat:xm36xv6cbfevveabvmpafgjtli

Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems

Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-mei Hwu, Volodymyr Kindratenko, Deming Chen
2021 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
In this paper, we demonstrate our design framework to accelerate the long-term recurrent convolution network (LRCN), which analyzes the input video and output one semantic caption for each frame.  ...  solutions, including a layer-based pipeline, a feature map partition scheme, and an efficient memory hierarchical design for the accelerator and multi-threading programming for the CPU.  ...  By adopting DNNs, such as convolutional neural networks (CNN) and recurrent neural networks (RNNs), the quality of video content analysis has been greatly improved.  ... 
doi:10.1109/tcad.2021.3093398 fatcat:ijqdietur5dd3ddps5hae5crmu

FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework

Alfonso Rodríguez, Juan Valverde, Jorge Portilla, Andrés Otero, Teresa Riesgo, Eduardo de la Torre
2018 Sensors  
This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without  ...  In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution.  ...  matrix multiplication is a recurrent operation that appears in a wide variety of algorithms used in that context (e.g., neural inference for embedded machine learning in smart sensors).  ... 
doi:10.3390/s18061877 pmid:29890644 pmcid:PMC6022175 fatcat:p5tsuzns3nea5mgws4ul5gf5ha

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator [article]

Tian Zhao, Yaqi Zhang, Kunle Olukotun
2019 arXiv   pre-print
Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads.  ...  Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization.  ...  We also thank Google for the cloud credits.  ... 
arXiv:1909.13654v1 fatcat:6w2ccglyanfmrohqler55k2pzu

2018 IndexIEEE Transactions on Very Large Scale Integration (VLSI) SystemsVol. 26

2018 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
., see 2723-2736 , VLSI Design of an ML-Based Power-Efficient Motion Estimation Controller for Intelligent Mobile Systems; TVLSI Feb. 2018 262-271 Hsieh, Y., see Tsai, Y., TVLSI May 2018 945-957  ...  Hsu, K., Chen, Y., Lee, Y., and Chang, S., Contactless Testing for Prebond Interposers; TVLSI June 2018 1005-1014 Hsu, Y., see Liu, Z., 1565-1574 Hu, J., see Wang, Y., TVLSI May 2018 805-817 Hu, J  ...  ., +, TVLSI April 2018 663-670 ReRAM-Based Processing-in-Memory Architecture for Recurrent Neural Network Acceleration.  ... 
doi:10.1109/tvlsi.2019.2892312 fatcat:rxiz5duc6jhdzjo4ybcxdajtbq

An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks

Dazhong He, Junhua He, Jun Liu, Jie Yang, Qing Yan, Yang Yang
2021 Electronics  
period, thus outperforming traditional feed-forward neural networks and Recurrent Neural Network (RNN) on learning long-term dependencies.  ...  Over the past two decades, Long Short-Term Memory (LSTM) networks have been used to solve problems that require modeling of long sequence because they can selectively remember certain patterns over a long  ...  Acknowledgments: This work was conducted on the platform of Center for Data Science of BUPT. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/electronics10060681 fatcat:ctgai3la6nbirp5tyzotqtzjlq

2020 Index IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 39

2020 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Ahmad, H., +, TCAD Nov. 2020 4191- 4204 Swallow: A Versatile Accelerator for Sparse Neural Networks.  ...  ., +, TCAD June 2020 1205-1216 Reconfigurable and Low-Complexity Accelerator for Convolutional and Generative Networks Over Finite Fields.  ...  Entropy-Directed Scheduling for FPGA High-Level Synthesis. Shen, M., +, TCAD Oct. 2020 2588 -2601 FLASH: Fast, Parallel, and Accurate Simulator for HLS.  ... 
doi:10.1109/tcad.2021.3054536 fatcat:wsw3olpxzbeclenhex3f73qlw4

FPGA Acceleration of Recurrent Neural Network Based Language Model

Sicheng Li, Chunpeng Wu, Hai Li, Boxun Li, Yu Wang, Qinru Qiu
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
Recurrent neural network (RNN) based language model (RNNLM) is a biologically inspired model for natural language processing.  ...  However, the use of RNNLM has been greatly hindered for the high computation cost in training. This work presents an FPGA implementation framework for RNNLM training acceleration.  ...  Recurrent neural network (RNN) is a special type of neural network that operates in time domain.  ... 
doi:10.1109/fccm.2015.50 dblp:conf/fccm/LiWLLWQ15 fatcat:dk66yqbdfvc2niu2acs3rwfn3q

A Survey of Neural Network Hardware Accelerators in Machine Learning

Fatimah Jasem, Manar AlSaraf
2021 Machine Learning and Applications An International Journal  
This literature summarizes (in terms of a survey) recent work of accelerators including their advantages and disadvantages to make it easier for developers with neural network interests to further improve  ...  Dedicated acceleration of Convolutional Neural Networks can achieve these targets with high flexibility to perform multiple vision tasks.  ...  Although the above mentioned accelerator can execute neural networks at more than one scale, it still needs a storage for the neuron values in main memory when dealing with larger neural networks.  ... 
doi:10.5121/mlaij.2021.8402 fatcat:vaya6cwywjaq3jefppxt6w2nuu

Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories

Robert Karam, Ruchir Puri, Swaroop Ghosh, Swarup Bhunia
2015 Proceedings of the IEEE  
Paul (Intel Corporation) and as well as researchers in the Nanoscape Research Lab in the Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA, for  ...  The use of PCM-based synapses for high-density spike neural network architectures has also been explored [73] .  ...  CAMs have also been used to accelerate network security applications, specifically network intrusion detection systems (NIDSs) [62] , [63] .  ... 
doi:10.1109/jproc.2015.2434888 fatcat:fpiqitohnvbqvjpzs5xl5w3pgi

The UCSC Kestrel parallel processor

A. Di Bias, D.M. Dahle, M. Diekhans, L. Grate, J. Hirschberg, K. Karplus, H. Keller, M. Kendrick, F.J. Mesa-Martinez, D. Pease, E. Rice, A. Schultz (+2 others)
2005 IEEE Transactions on Parallel and Distributed Systems  
Between these extremes, programmable and reconfigurable architectures provide a wide range of choice in flexibility, programmability, computational density, and performance.  ...  Kestrel is a single-instruction stream, multipledata stream (SIMD) parallel processor with a 512-element linear array of 8-bit processing elements.  ...  ACKNOWLEDGMENTS The authors thank Ken Kennedy for valuable discussions and encouraging comments on several versions of this paper. This work was supported in part by US National Science  ... 
doi:10.1109/tpds.2005.12 fatcat:fkjyztff5falpp5owlrc2wviai

Parallel Computing for Brain Simulation

L. A. Pastur-Romay, A. B. Porto-Pazos, F. Cedron, A. Pazos
2017 Current Topics in Medicinal Chemistry  
Aims: For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before  ...  Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain.  ...  219] and a reconfigurable chip for online learning [220] .  ... 
doi:10.2174/1568026617666161104105725 pmid:27823566 fatcat:wlcngyt5ubcrxpyhzyepjlsqyu

iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing

Menbere Kina Tekleyohannes, Vladimir Rybalkin, Muhammad Mohsin Ghaffar, Javier Alejandro Varela, Norbert Wehn, Andreas Dengel
2021 Journal of Imaging  
Many libraries offer special stationary equipment for scanning historical documents.  ...  An existing end-to-end OCR software called anyOCR achieves high recognition accuracy for historical documents.  ...  The last pipeline step, Text Line Recognition, is a character recognition step based on a Bidirectional LSTM (Bi-LSTM) recurrent neural network.  ... 
doi:10.3390/jimaging7090175 pmid:34564101 pmcid:PMC8467298 fatcat:jsgsgmvfmrg3flzpkpeuoyvq2q
« Previous Showing results 1 — 15 out of 68 results