220 Hits in 7.2 sec

Efficient Table-based Function Approximation on FPGAs using Interval Splitting and BRAM Instantiation [article]

Chetana Pradhan, Martin Letras, Jürgen Teich
2022 arXiv   pre-print
This paper proposes a novel approach for the generation of memory-efficient table-based function approximation circuits for FPGAs.  ...  Given a function f(x) to be approximated in a given interval [x0,x0+a] and a maximum approximation error Ea, the goal is to determine a function table implementation with a minimized memory footprint,  ...  Proposed generic hardware implementation for table-based function approximation using intervalsplitting and BRAM instantiation.  ... 
arXiv:2204.02443v3 fatcat:kmoznsat2ra2bmrksqj4zy34r4

High-level synthesizable dataflow MapReduce accelerator for FPGA-coupled data centers

Dionysios Diamantopoulos, Christoforos Kachris
2015 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)  
minimum energy footprint and high programming efficiency due to the use of HLS.  ...  FPGAs can be used to accelerate the processing of data and reduce significantly the power consumption.  ...  function boundaries and improved latency/interval by the reduction of function call overhead.  ... 
doi:10.1109/samos.2015.7363656 dblp:conf/samos/DiamantopoulosK15 fatcat:tg2h7ptl55cmfckegx2pqjxawi

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics [article]

Stephanie Soldavini, Karl F. A. Friebel, Mattia Tibaldi, Gerald Hempel, Jeronimo Castrillon, Christian Pilato
2022 arXiv   pre-print
and on-chip storage.  ...  Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example.  ...  To further reduce the error, SEM uses an approximation based on polynomials of a higher degree (𝑝 > 1).  ... 
arXiv:2203.10850v4 fatcat:2mlvwtabhjadxf7xjm5wacform

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

Konstantinos Krommydas, Wu-chun Feng, Christos D. Antonopoulos, Nikolaos Bellas
2015 Journal of Signal Processing Systems  
Using OpenDwarfs, we characterize a diverse set of modern fixed and reconfigurable parallel platforms: multicore CPUs, discrete and integrated GPUs, Intel Xeon Phi co-processor, as well as a FPGA.  ...  ., CPU, APU, GPU, FPGA, DSP) and computing environments (e.g., embedded, mobile, desktop, server).  ...  To evaluate OpenDwarfs on FPGAs, we use the Xilinx Virtex-6 LX760 FPGA on a PCIe v2.1 board, which consumes approximately 50 W and contains 118560 logic slices.  ... 
doi:10.1007/s11265-015-1051-z fatcat:ifnbayv26zdttgeovidgjqtoue

Scotch: Generating FPGA-Accelerators for Sketching at Line Rate

Martin Kiefer, Ilias Poulakis, Sebastian Breß, Volker Markl
2020 Proceedings of the VLDB Endowment  
While FPGAs have shown admirable throughput and energy-efficiency for data processing tasks, developing FPGA accelerators requires a sophisticated hardware design and expensive manual tuning by an expert  ...  We propose Scotch, a novel system for accelerating sketch maintenance using FPGAs.  ...  ACKNOWLEDGMENTS This work has received funding by the German Ministry for Education and Research as BIFOLD -Berlin Institute for the Foundations of Learning and Data (01IS18025A and 01IS18037A) and Software  ... 
doi:10.5555/3430915.3442428 dblp:journals/pvldb/KieferPBM20 fatcat:dxkoqtxm5bb4bb5k2ix5o7r5cq

Particle Mesh Ewald for Molecular Dynamics in OpenCL on an FPGA Cluster [article]

Lawrence C. Stewart and Carlo Pascoe and Brian W. Sherman and Martin Herbordt and Vipin Sachdeva
2021 arXiv   pre-print
The design is fully implemented with OpenCL for flexibility and ease of development and uses 100 Gbps links for direct FPGA-to-FPGA communications without the need for host interaction.  ...  In this paper, we present the design and implementation of a scalable PME algorithm that runs on a cluster of Intel Stratix 10 FPGAs and can handle FFT sizes appropriate to address real-world drug discovery  ...  A larger cluster will also allow us to use BRAM only on FPGAs for 128 3 transform as well.  ... 
arXiv:2009.12617v4 fatcat:a2olheavmjb33psaqmkhqec57u

FPGA-embedded Linearized Bregman Iteration algorithm for trend break detection

Felipe Calliari, Gustavo Castro do Amaral, Michael Lunglmayr
2020 EURASIP Journal on Wireless Communications and Networking  
The hardware is synthesized in different-sized FPGAs, and the percentage of used hardware, as well as the maximum frequency enabled by the design, indicate that an approximately 100 gain factor in processing  ...  In this work, a hardware architecture of the Linearized Bregman Iteration algorithm is presented and tested on a Field Programmable Gate Array (FPGA).  ...  This work has been supported by the COMET-K2 "Center for Symbiotic Mechatronics" of the Linz Center of Mechatronics (LCM) funded by the Austrian federal government and the federal state of Upper Austria  ... 
doi:10.1186/s13638-020-01796-0 fatcat:vexrdgygifb6rdrj56ouf6uyii

An FPGA-Based LDPC Decoder with Ultra-Long Codes for Continuous-Variable Quantum Key Distribution

Shen-Shen Yang, Jian-Qiang Liu, Zhen-Guo Lu, Zeng-Liang Bai, Xu-Yang Wang, Yong-Min Li
2021 IEEE Access  
To reduce implementation complexity and hardware resource consumption, the messages in the iteration process are uniformly quantified and the function (x) is approximated with second-order functions.  ...  The implementation results show that the FPGA-based LDPC decoder can achieve throughputs of 108.64 Mb/s and 70.32 Mb/s at SNR of 1.0 dB when the code length is 262,144 and 349,952, respectively.  ...  The PCM _MEM is used to store the PCM and only need to instantiate a BRAM.  ... 
doi:10.1109/access.2021.3065776 fatcat:xpdeehuqyvg5vjvslypmirjofe

Scaling Binarized Neural Networks on Reconfigurable Logic

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
2017 Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '17  
Based on this technique, we demonstrate numerous experiments to illustrate flexibility and scalability of the approach.  ...  However, FINN was not evaluated on larger topologies due to the size of the chosen FPGA, and exhibited decreased accuracy due to lack of padding.  ...  BRAM Efficiency Since FINN currently focuses on BNNs that fit entirely onto the on-chip memory of a single FPGA, making the most out of the available on-chip memory is essential.  ... 
doi:10.1145/3029580.3029586 dblp:conf/hipeac/FraserUGBLJV17 fatcat:jliterfdmbbp3ao5yly4masuce

A special-purpose compiler for look-up table and code generation for function evaluation

Yuanrui Zhang, Lanping Deng, Praveen Yedlapalli, Sai Prashanth Muralidhara, Hui Zhao, Mahmut Kandemir, Chaitali Chakrabarti, Nikos Pitsianis, Xiaobai Sun
2010 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)  
Elementary functions are extensively used in computer graphics, signal and image processing, and communication systems.  ...  This paper presents a special-purpose compiler that automatically generates customized look-up tables and implementations for elementary functions under user given constraints.  ...  ACKNOWLEDGMENTS This work is supported in part by the DARPA DESA program, the NSF grants CNS #0720645, CCF #0811687, CCF #0702519, CNS #0202007 and CNS #0509251, as well as a grant from Microsoft Corporation  ... 
doi:10.1109/date.2010.5456978 dblp:conf/date/ZhangDYMZKCPS10 fatcat:ebwxtrlhrbbt5e4wlhsj47livi

Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation [chapter]

Sujoy Sinha Roy, Kimmo Järvinen, Frederik Vercauteren, Vassil Dimitrov, Ingrid Verbauwhede
2015 Lecture Notes in Computer Science  
We present a hardware architecture for all building blocks required in polynomial ring based fully homomorphic schemes and use it to instantiate the somewhat homomorphic encryption scheme YASHE.  ...  Our implementation is the first FPGA implementation that is designed for evaluating functions on homomorphically encrypted data (up to a certain multiplicative depth) and we illustrate this capability  ...  The reported FPGA-based architectures [3, 33, 35] of such cryptosystems use BRAM slices to store the polynomials and use arithmetic components made up of DSP multipliers and LUTs.  ... 
doi:10.1007/978-3-662-48324-4_9 fatcat:6dlnxjhaojfntaq7gajnboruhu

Scaling Binarized Neural Networks on Reconfigurable Logic [article]

Nicholas J. Fraser, Yaman Umuroglu, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
2017 arXiv   pre-print
Based on this technique, we demonstrate numerous experiments to illustrate flexibility and scalability of the approach.  ...  However, FINN was not evaluated on larger topologies due to the size of the chosen FPGA, and exhibited decreased accuracy due to lack of padding.  ...  thus avoids potential bottlenecks on external more BRAMs than is available in the FPGA.  ... 
arXiv:1701.03400v2 fatcat:lf52l3zre5dxndh6wd2xy3v4h4

Spiker: an FPGA-optimized Hardware acceleration for Spiking Neural Networks [article]

Alessio Carpegna, Alessandro Savino, Stefano Di Carlo
2022 arXiv   pre-print
The test design consists of a single layer of four-hundred neurons and uses around 40 available resources on the FPGA.  ...  Spiking Neural Networks (SNN) are an emerging type of biologically plausible and efficient Artificial Neural Network (ANN).  ...  Many FPGAs, like the one used in this paper (see section IV), are equipped with Block RAM (BRAM). A BRAM is a memory, divided into blocks that are directly integrated within the FPGA.  ... 
arXiv:2201.06993v3 fatcat:cejemhueyfcexkiice5v75niti

Explicit design of FPGA-based coprocessors for short-range force computations in molecular dynamics simulations

Yongfeng Gu, Tom VanCourt, Martin C. Herbordt
2008 Parallel Computing  
FPGA-based acceleration of molecular dynamics simulations (MD) has been the subject of several recent studies.  ...  Extensive experimentation was required to optimize precision, interpolation order, interpolation mode, table sizes, and simulation quality.  ...  We also thank the anonymous reviewers for their dozens of helpful comments and suggestions through which this article has been much improved.  ... 
doi:10.1016/j.parco.2008.01.007 pmid:19412319 pmcid:PMC2440579 fatcat:foc7lpzgafbrfmxn2bcr5h3td4

High-level synthesis of dynamic data structures: A case study using Vivado HLS

Felix Winterstein, Samuel Bayliss, George A. Constantinides
2013 2013 International Conference on Field-Programmable Technology (FPT)  
Algorithms which use dynamic, pointer-based data structures, which are common in software, remain difficult to implement well.  ...  High-level synthesis promises a significant shortening of the FPGA design cycle when compared with design entry using register transfer level (RTL) languages.  ...  The interval between the start of two iterations is given by the initiation interval (II). Loop unrolling is used to force parallel instantiations of the loop body.  ... 
doi:10.1109/fpt.2013.6718388 dblp:conf/fpt/WintersteinBC13 fatcat:i5b5i4m435fx7j3ohnsekiseii
« Previous Showing results 1 — 15 out of 220 results