2,540 Hits in 4.7 sec

High performance sparse matrix-vector multiplication on FPGA

Dan Zou, Yong Dou, Song Guo, Shice Ni
2013 IEICE Electronics Express  
This paper presents the design and implementation of a high performance sparse matrix-vector multiplication (SpMV) on fieldprogrammable gate array (FPGA).  ...  By proposing a new storage format to compress the indexes of non-zero elements by exploiting the substructure of the sparse matrix, our SpMV implementation on a reconfigurable computing platform with a  ...  The accelerator receives the sparse matrix in BVCSR format and vector x, executes the SpMV algorithm and sends the result vector y back to the host.  ... 
doi:10.1587/elex.10.20130529 fatcat:rjhtbcwllbe3jbigtfxrfo7aby

FPGAN: An FPGA Accelerator for Graph Attention Networks with Software and Hardware Co-Optimization

Weian Yan, Weiqin Tong, Xiaoli Zhi
2020 IEEE Access  
The number of DSPs in the FPGA affects the performance on acceleration and energy efficiency of the accelerator.  ...  DEEP LEARNING INFERENCE ACCELERATORS Various FPGA-based inference accelerators of CNNs have been proposed.  ...  Author Name: Preparation of Papers for IEEE Access (February 2017) VOLUME XX, 2017  ... 
doi:10.1109/access.2020.3023946 fatcat:asr2ipn7snelzkpr5zxvfwpgba

The Promise of Reconfigurable Computing for Hyperspectral Imaging Onboard Systems: A Review and Trends

Sebastian Lopez, Tanya Vladimirova, Carlos Gonzalez, Javier Resano, Daniel Mozos, Antonio Plaza
2013 Proceedings of the IEEE  
Fast processing solutions for compression and/or interpretation of hyperspectral data onboard spacecraft imaging platforms are discussed in this paper with the purpose of giving a more efficient exploitation  ...  of hyperspectral data sets in various applications.  ...  Acknowledgment The authors would like to thank the Guest Editors of this special issue for their very kind invitation to provide a contribution, as well as the three anonymous reviewers for their outstanding  ... 
doi:10.1109/jproc.2012.2231391 fatcat:aepzokz6wne2dbxtlx3ij5sqau

A Systematic Review of Hardware-Accelerated Compression of Remotely Sensed Hyperspectral Images

Amal Altamimi, Belgacem Ben Ben Youssef
2021 Sensors  
We present herein a systematic review of hardware-accelerated compression of hyperspectral images targeting remote sensing applications. We reviewed a total of 101 papers published from 2000 to 2021.  ...  Furthermore, we rank the best algorithms based on efficiency and elaborate on the major factors impacting the performance of hardware-accelerated compression.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/s22010263 pmid:35009804 pmcid:PMC8749878 fatcat:uvhmdpurevajvoxkf23mv5ie7a

Author index

2007 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP)  
of a Binary Integer Decimal-based IEEE P754 Rounding Unit Reconfi gurable Motion Estimation Architecture for Multi-standard Video Compression SIMD Vectorization of Histogram Functions A Triplet Based  ...  Specifi c Memory Characterization Technique for Co-processor Accelerators Estimating Area Costs of Custom Instructions for Design Exploration of FPGA-based Reconfi gurable Processors Temperature-Aware  ... 
doi:10.1109/asap.2007.4459300 fatcat:lbxlom2lkrf2jf3q5c56uwiuea

FPGA-Based On-Board Hyperspectral Imaging Compression: Benchmarking Performance and Energy Efficiency against GPU Implementations

Julián Caba, María Díaz, Jesús Barba, Raúl Guerra, Jose A. de la Torre and Sebastián López
2020 Remote Sensing  
In this regard, a modification of the aforementioned lossy compression solution has also been proposed to be efficiently executed into FPGA devices using fixed-point arithmetic.  ...  In this work, a highly optimized implementation of an FPGA accelerator of the novel HyperLCA algorithm has been developed and thoughtfully analyzed in terms of performance and power efficiency.  ...  accelerators increases in a larger FPGA.  ... 
doi:10.3390/rs12223741 fatcat:ed6hmonq3fevpdhcag2eflt5nu

A Line Rate Outlier Filtering FPGA NIC using 10GbE Interface

Ami Hayashi, Yuta Tokusashi, Hiroki Matsutani
2016 SIGARCH Computer Architecture News  
We select an outlier detection based on the Mahalanobis distance as one of the simplest algorithms. Our approach is implemented on an FPGA-based NIC that has 10GbE interfaces.  ...  The sampling frequency of the NIC buffer vs. outlier detection precision is analyzed.  ...  Acknowledgements A part of this work was supported by JST PRESTO.  ... 
doi:10.1145/2927964.2927969 fatcat:ul2xo45rnjhtvlhtatxfo7uriq

Efficient smart-camera accelerator: A configurable motion estimator dedicated to video codec

Wajdi Elhamzi, Julien Dubois, Johel Miteran, Mohamed Atri, Barthelemy Heyrman, Dominique Ginhac
2013 Journal of systems architecture  
We have developed a flexible hardware implementation of the motion estimator based on FPGA component, fully compatible with H.264, which enables the integer motion search, the fractional search and variable  ...  We propose in this paper to focus on a key part of the compression system: motion estimation.  ...  The compression is therefore embedded in the smart camera using a dedicated accelerator. The adjustment of coding performance that we propose in this paper is different than state-of-art approaches.  ... 
doi:10.1016/j.sysarc.2013.05.005 fatcat:e34efbggkbbyxdedo2ix7pfrwu

Column Scan Acceleration in Hybrid CPU-FPGA Systems

Nusrat Jahan Lisa, Annett Ungethüm, Dirk Habich, Wolfgang Lehner, Tuan D. A. Nguyen, Akash Kumar
2018 Very Large Data Bases Conference  
In detail, we present our basic FPGA design and different optimization techniques. Then, we present selective results of our exhaustive evaluation showing the benefit of our FPGA acceleration.  ...  Based on that, we focus on column scan acceleration for hybrid hardware systems incorporating a Field Programmable Gate Array (FPGA) and a CPU into a single system in this paper.  ...  and (ii) apply lightweight lossless data compression to each sequence of integers resulting in a sequence of compressed column codes [1, 2, 21] .  ... 
dblp:conf/vldb/LisaUHLN018 fatcat:mf2a3ca2effh5mz4abppxiyaau

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA [article]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally
2017 arXiv   pre-print
Implemented on Xilinx XCKU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working directly on the compressed LSTM network, corresponding to 2.52 TOPS on the uncompressed one, and processes  ...  Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of a data center.  ...  This work was supported by National Natural Science Foundation of China (No.61373026, 61622403, 61261160501).  ... 
arXiv:1612.00694v2 fatcat:a65wy2piqnezjjjmauub7rdlai

Hyperspectral Compressive Sensing with a System-On-Chip FPGA

Jose M. P. Nascimento, Mario P. Vestias, Gabriel Martin
2020 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
The proposed system runs 49× and 216× faster than an embedded 256-cores GPU of a Jetson TX2 board and the ARM of the SoC FPGA, respectively.  ...  Index Terms-Compressive sensing, field-programmable gate arrays (FPGA), hyperspectral imagery, on-board processing, real time.  ...  Each inner product with a vector of H produces one element of the vector associated with the compressed pixel.  ... 
doi:10.1109/jstars.2020.2996679 fatcat:lpxw24lfkrds7l2nxad334nzdi

FPGA Implementation of Real-Time Compressive Sensing with Partial Fourier Dictionary

Yinghui Quan, Yachao Li, Xiaoxiao Gao, Mengdao Xing
2016 International Journal of Antennas and Propagation  
This paper presents a novel real-time compressive sensing (CS) reconstruction which employs high density field-programmable gate array (FPGA) for hardware acceleration.  ...  For large scale dictionary, the implementation of correlation is time consuming since it often requires a large number of matrix multiplications.  ...  Conflict of Interests The authors declare that there is no conflict of interests regarding the publication of this paper.  ... 
doi:10.1155/2016/1671687 fatcat:26y7hlsvtjaepcwlsgzrumajfi

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs [article]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Yanzhi Wang, Qinru Qiu, Yun Liang
2018 arXiv   pre-print
The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs.  ...  Unfortunately, the ever-increasing size of LSTM model leads to inefficient designs on FPGAs due to the limited on-chip resources.  ...  Since the compressed weight matrices are still dense, the block-circulant matrix based compression is amenable to hardware acceleration on FPGAs.  ... 
arXiv:1803.06305v1 fatcat:st6r57l6grbu3jprxtqo7bo2q4

ASC: a stream compiler for computing with FPGAs

O. Mencer
2006 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
The increased productivity of ASC is applied to the hardware acceleration of a wide range of applications. Traditionally, hardware accelerators are tediously handcrafted to achieve top performance.  ...  more rapidly than the productivity of very large scale integration (VLSI) and FPGA computer-aideddesign (CAD) tools.  ...  An STL class such as a vector can be instantiated as a vector of integers (vector <int>), a vector of floats (vector <float>), or a vector of any other user-defined class such as vector <Net>.  ... 
doi:10.1109/tcad.2005.857377 fatcat:4ccpxhmrijgcfg4cwqo6kiry6a

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs [article]

Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang
2018 arXiv   pre-print
accelerator.  ...  Both industry and academia have extensively investigated hardware accelerations.  ...  The first trend is hardware acceleration. FPGA-based accelerators have the advantage of friendly programmability and high-degree parallelism.  ... 
arXiv:1804.11239v1 fatcat:xzrhegowvvem3ausfk3bj6r52i
« Previous Showing results 1 — 15 out of 2,540 results