A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
High-throughput FPGA-based Hardware Accelerators for Deflate Compression and Decompression Using High-Level Synthesis
2020
IEEE Access
In this article, FPGA-based accelerators for Deflate compression and decompression are described. ...
Field-programmable gate arrays (FPGAs) are commonly used to implement hardware accelerators that speed up computation-intensive applications. ...
Both accelerator designs were created using high-level synthesis, from C++ source code, and target Xilinx FPGAs using a 250-MHz system clock. ...
doi:10.1109/access.2020.2984191
fatcat:b6qwxqh4erfw5b5rqcbjucgqui
An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL
2017
International Journal of Networked and Distributed Computing (IJNDC)
We propose an FPGA accelerator designed using C-based OpenCL for the heterogeneous environment. ...
Although the hardware acceleration using FPGAs provides promising results, huge design time and hardware design skills are required to implement an accelerator successfully. ...
Therefore, we accelerate this computation using FPGAs. The software programs 22 use cell-pair list to reduce the computation amount. ...
doi:10.2991/ijndc.2017.5.1.6
fatcat:qjjyz2xlhncofhr2yqyyhuiri4
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
[article]
2018
arXiv
pre-print
The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical ...
Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. ...
Finally, we develop decoupling technique for FFT/IFFT pairs for further acceleration and facilitating hardware (FPGA) implementations. ...
arXiv:1802.06402v1
fatcat:cxbnxjl5mne3pkrj65ln2ph6nm
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
2018
IEEE Transactions on Neural Networks and Learning Systems
Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. ...
This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. ...
The compressed output feature maps are then sent offchip.
A. Sparse Matrix Compression Scheme The NullHop accelerator uses a novel sparse matrix compression algorithm. ...
doi:10.1109/tnnls.2018.2852335
pmid:30047912
fatcat:pmkgmoeowrhdphfsy667a75nm4
Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
[article]
2018
arXiv
pre-print
"Weight sharing" accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins and the bin index used to access the weight-shared value. ...
We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate ...
They encode the compressed weights with an index that specifies which of the shared weights should be used. ...
arXiv:1801.10219v3
fatcat:fqscynhdj5dhligk6i5z6chwhy
ZIP-IO: Architecture for application-specific compression of Big Data
2012
2012 International Conference on Field-Programmable Technology
To address this issue, we investigate ZIP-IO, a framework for FPGA-accelerated compression. ...
Using this system we demonstrate that an unmodified industrial software workload can be accelerated 3x while simultaneously achieving more than 1000x compression in its data set. ...
Although we are currently using a V5LX330T for prototyping, this large FPGA is not necessary to acheive compression acceleration. ...
doi:10.1109/fpt.2012.6412159
dblp:conf/fpt/JunFAE12
fatcat:wi4mvrk4yravrftd6fobkwlgme
OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs
2020
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The innovation of OMNI stems from that it uses hardware amenable on-chip memory partition patterns to seamlessly engage the software CNN model compression and hardware CNN acceleration. ...
However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters including sparsity, performance, hardware area ...
In this case, the weights are sorted by the pair (Re;Qu) or (Qu;Re). ...
doi:10.1109/tcad.2020.3023903
fatcat:msbzea46tjdshlyryjkmwqwrxe
Memory optimisation for hardware induction of axis-parallel decision tree
2014
2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)
The induction of Decision Trees (i.e. training stage) involves intense memory communication and inherent parallel processing, making an FPGA device a promising platform for accelerating the training process ...
Therefore, efficient use of the embedded memory is critical not only for allowing larger training dataset to be processed on an FPGA but also for making available as many memory channels as possible to ...
Impact on a Random Forest Training System The hardware implementation of the long method is incorporated into an FPGA architecture that is used to accelerate the training process of Random Forest (RF) ...
doi:10.1109/reconfig.2014.7032538
dblp:conf/reconfig/ChengB14
fatcat:4a2bpmehvnhonguoaolldotvme
FPGA Accelerated Low-Latency Market Data Feed Processing
2009
2009 17th IEEE Symposium on High Performance Interconnects
This paper presents an FPGA accelerated approach to market data feed processing, using an FPGA connected directly to the network to parse, optionally decompress, and filter the feed, and then to push the ...
This approach is demonstrated using the Celoxica AMDC board, which accepts a pair of redundant data feeds over two gigabit Ethernet ports, parses and filters the data, then pushes relevant messages directly ...
, and processing ability appropriate for trading algorithms. • An implementation of this model using the Celoxica AMDC accelerator card, which is able to filter and process a redundant pair of market data ...
doi:10.1109/hoti.2009.17
dblp:conf/hoti/MorrisTL09
fatcat:4by4mjel5jejxg7axpw63uda2m
Heterogeneous architecture for reversible watermarking system for medical images using Integer transform based Reverse Contrast Mapping
2021
Turkish Journal of Computer and Mathematics Education
In this paper, in order to achieve higher performance, low latency, unlimited re-configurability and most important very high energy efficiency, we have proposed a hybrid architecture using CPU and FPGA ...
One way to attain a balance of high data throughput and flexibility is to combine soft-core FPGA accelerators with CPUs as hosts. ...
Segmentation-Image will be partitioned into ROI (Region of Interest) and RONI (Region of Not Interest) using binary masking technique (roipoly). * Corresponding author: 2) FPGA accelerator implementations ...
doi:10.17762/turcomat.v12i6.1377
fatcat:x32q34yxsfehvd47ec2zuvldpu
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
2015
IEEE Micro
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. ...
In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. ...
FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search. ...
doi:10.1109/mm.2015.42
fatcat:5ywjh7qpljgm7cvkqwudzqhjky
A reconfigurable fabric for accelerating large-scale datacenter services
2014
SIGARCH Computer Architecture News
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. ...
In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. ...
FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search. ...
doi:10.1145/2678373.2665678
fatcat:zc3hc7ijbjdmxi4qfegrzpyfy4
A reconfigurable fabric for accelerating large-scale datacenter services
2014
2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. ...
In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. ...
FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search. ...
doi:10.1109/isca.2014.6853195
dblp:conf/isca/PutnamCCCCDEFGGHHHHKLLPPSTXB14
fatcat:mq75web33vbzjp3qyovjq32hju
A reconfigurable fabric for accelerating large-scale datacenter services
2016
Communications of the ACM
One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. ...
In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. ...
FPGAs have been used to implement and accelerate important datacenter applications such as Memcached [17, 6] compression/decompression [14, 19] , K-means clustering [11, 13] , and web search. ...
doi:10.1145/2996868
fatcat:5hqxeegad5gibgct66bb7ebgw4
A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities
[article]
2019
arXiv
pre-print
In this paper, we systematically investigate the neural network accelerator based on FPGA. ...
Finally, we present to discuss the advantages and disadvantages of accelerators on FPGA platforms and to further explore the opportunities for future research. ...
GTX TITAN X
1002MHz 12G GDDR5
float32
1661
6.60
250W
Res-152 [24]
FPGA
Arria 10 GX 1150
150MHz
-
float16
315.5
-
-
Res-50 [24]
FPGA
Arria 10 GX 1150
150MHz
-
float16
285.07
-
-
Res-50 ...
arXiv:1901.04988v2
fatcat:k662lznh5nho7i2jiwxtspmhdu
« Previous
Showing results 1 — 15 out of 2,448 results