Filters








2,448 Hits in 4.1 sec

High-throughput FPGA-based Hardware Accelerators for Deflate Compression and Decompression Using High-Level Synthesis

Morgan Ledwon, Bruce F. Cockburn, Jie Han
2020 IEEE Access  
In this article, FPGA-based accelerators for Deflate compression and decompression are described.  ...  Field-programmable gate arrays (FPGAs) are commonly used to implement hardware accelerators that speed up computation-intensive applications.  ...  Both accelerator designs were created using high-level synthesis, from C++ source code, and target Xilinx FPGAs using a 250-MHz system clock.  ... 
doi:10.1109/access.2020.2984191 fatcat:b6qwxqh4erfw5b5rqcbjucgqui

An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL

Hasitha Muthumala Waidyasooriya, Masanori Hariyama, Kota Kasahara
2017 International Journal of Networked and Distributed Computing (IJNDC)  
We propose an FPGA accelerator designed using C-based OpenCL for the heterogeneous environment.  ...  Although the hardware acceleration using FPGAs provides promising results, huge design time and hardware design skills are required to implement an accelerator successfully.  ...  Therefore, we accelerate this computation using FPGAs. The software programs 22 use cell-pair list to reduce the computation amount.  ... 
doi:10.2991/ijndc.2017.5.1.6 fatcat:qjjyz2xlhncofhr2yqyyhuiri4

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework [article]

Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan, Siyu Liao, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, Xue Lin
2018 arXiv   pre-print
The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical  ...  Hardware accelerations of deep learning systems have been extensively investigated in industry and academia.  ...  Finally, we develop decoupling technique for FFT/IFFT pairs for further acceleration and facilitating hardware (FPGA) implementations.  ... 
arXiv:1802.06402v1 fatcat:cxbnxjl5mne3pkrj65ln2ph6nm

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
2018 IEEE Transactions on Neural Networks and Learning Systems  
Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications.  ...  This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements.  ...  The compressed output feature maps are then sent offchip. A. Sparse Matrix Compression Scheme The NullHop accelerator uses a novel sparse matrix compression algorithm.  ... 
doi:10.1109/tnnls.2018.2852335 pmid:30047912 fatcat:pmkgmoeowrhdphfsy667a75nm4

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing [article]

James Garland, David Gregg
2018 arXiv   pre-print
"Weight sharing" accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins and the bin index used to access the weight-shared value.  ...  We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate  ...  They encode the compressed weights with an index that specifies which of the shared weights should be used.  ... 
arXiv:1801.10219v3 fatcat:fqscynhdj5dhligk6i5z6chwhy

ZIP-IO: Architecture for application-specific compression of Big Data

Sang Woo Jun, Kermin E. Fleming, Michael Adler, Joel Emer
2012 2012 International Conference on Field-Programmable Technology  
To address this issue, we investigate ZIP-IO, a framework for FPGA-accelerated compression.  ...  Using this system we demonstrate that an unmodified industrial software workload can be accelerated 3x while simultaneously achieving more than 1000x compression in its data set.  ...  Although we are currently using a V5LX330T for prototyping, this large FPGA is not necessary to acheive compression acceleration.  ... 
doi:10.1109/fpt.2012.6412159 dblp:conf/fpt/JunFAE12 fatcat:wi4mvrk4yravrftd6fobkwlgme

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Yun Liang, Liqiang Lu, Jiaming Xie
2020 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
The innovation of OMNI stems from that it uses hardware amenable on-chip memory partition patterns to seamlessly engage the software CNN model compression and hardware CNN acceleration.  ...  However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters including sparsity, performance, hardware area  ...  In this case, the weights are sorted by the pair (Re;Qu) or (Qu;Re).  ... 
doi:10.1109/tcad.2020.3023903 fatcat:msbzea46tjdshlyryjkmwqwrxe

Memory optimisation for hardware induction of axis-parallel decision tree

Chuan Cheng, Christos-Savvas Bouganis
2014 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)  
The induction of Decision Trees (i.e. training stage) involves intense memory communication and inherent parallel processing, making an FPGA device a promising platform for accelerating the training process  ...  Therefore, efficient use of the embedded memory is critical not only for allowing larger training dataset to be processed on an FPGA but also for making available as many memory channels as possible to  ...  Impact on a Random Forest Training System The hardware implementation of the long method is incorporated into an FPGA architecture that is used to accelerate the training process of Random Forest (RF)  ... 
doi:10.1109/reconfig.2014.7032538 dblp:conf/reconfig/ChengB14 fatcat:4a2bpmehvnhonguoaolldotvme

FPGA Accelerated Low-Latency Market Data Feed Processing

Gareth W. Morris, David B. Thomas, Wayne Luk
2009 2009 17th IEEE Symposium on High Performance Interconnects  
This paper presents an FPGA accelerated approach to market data feed processing, using an FPGA connected directly to the network to parse, optionally decompress, and filter the feed, and then to push the  ...  This approach is demonstrated using the Celoxica AMDC board, which accepts a pair of redundant data feeds over two gigabit Ethernet ports, parses and filters the data, then pushes relevant messages directly  ...  , and processing ability appropriate for trading algorithms. • An implementation of this model using the Celoxica AMDC accelerator card, which is able to filter and process a redundant pair of market data  ... 
doi:10.1109/hoti.2009.17 dblp:conf/hoti/MorrisTL09 fatcat:4by4mjel5jejxg7axpw63uda2m

Heterogeneous architecture for reversible watermarking system for medical images using Integer transform based Reverse Contrast Mapping

Subodh S. Ingaleshwara Et.al
2021 Turkish Journal of Computer and Mathematics Education  
In this paper, in order to achieve higher performance, low latency, unlimited re-configurability and most important very high energy efficiency, we have proposed a hybrid architecture using CPU and FPGA  ...  One way to attain a balance of high data throughput and flexibility is to combine soft-core FPGA accelerators with CPUs as hosts.  ...  Segmentation-Image will be partitioned into ROI (Region of Interest) and RONI (Region of Not Interest) using binary masking technique (roipoly). * Corresponding author: 2) FPGA accelerator implementations  ... 
doi:10.17762/turcomat.v12i6.1377 fatcat:x32q34yxsfehvd47ec2zuvldpu

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck (+11 others)
2015 IEEE Micro  
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables.  ...  In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine.  ...  FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search.  ... 
doi:10.1109/mm.2015.42 fatcat:5ywjh7qpljgm7cvkqwudzqhjky

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Adrian M. Caulfield (+11 others)
2014 SIGARCH Computer Architecture News  
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables.  ...  In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine.  ...  FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search.  ... 
doi:10.1145/2678373.2665678 fatcat:zc3hc7ijbjdmxi4qfegrzpyfy4

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck (+11 others)
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables.  ...  In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine.  ...  FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search.  ... 
doi:10.1109/isca.2014.6853195 dblp:conf/isca/PutnamCCCCDEFGGHHHHKLLPPSTXB14 fatcat:mq75web33vbzjp3qyovjq32hju

A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Adrian M. Caulfield (+11 others)
2016 Communications of the ACM  
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables.  ...  In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine.  ...  FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search.  ... 
doi:10.1145/2996868 fatcat:5hqxeegad5gibgct66bb7ebgw4

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities [article]

Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen
2019 arXiv   pre-print
In this paper, we systematically investigate the neural network accelerator based on FPGA.  ...  Finally, we present to discuss the advantages and disadvantages of accelerators on FPGA platforms and to further explore the opportunities for future research.  ...  GTX TITAN X 1002MHz 12G GDDR5 float32 1661 6.60 250W Res-152 [24] FPGA Arria 10 GX 1150 150MHz - float16 315.5 - - Res-50 [24] FPGA Arria 10 GX 1150 150MHz - float16 285.07 - - Res-50  ... 
arXiv:1901.04988v2 fatcat:k662lznh5nho7i2jiwxtspmhdu
« Previous Showing results 1 — 15 out of 2,448 results