340 Hits in 6.8 sec

Reconfigurable acceleration of 3D image registration

Kuen Hung Tsoi, Daniel Rueckert, Chun Hok Ho, Wayne Luk
2009 2009 5th Southern Conference on Programmable Logic (SPL)  
This paper proposes techniques for accelerating a software based image registration algorithm for 3D medical images targeting a reconfigurable hardware platform.  ...  Based on the reconfigurability of FPGA devices, the system can be extended to swap modules optimized for different parameters, and to adopt more advanced registration algorithms.  ...  Conclusion This paper presents a reconfigurable framework for accelerating registration algorithms for 3D medical images.  ... 
doi:10.1109/spl.2009.4914908 fatcat:fcrvvrhicndv5fctivlcivokay

An Automated Framework for Accelerating Numerical Algorithms on Reconfigurable Platforms Using Algorithmic/Architectural Optimization

Jung Sub Kim, Lanping Deng, P. Mangalagiri, K. Irick, K. Sobti, M. Kandemir, V. Narayanan, C. Chakrabarti, N. Pitsianis, Xiaobai Sun
2009 IEEE transactions on computers  
Subsequently, TANOR automatically generates a configuration bitstream for a target FPGA along with associated drivers and control software necessary to direct the application from a host PC.  ...  This paper describes TANOR, an automated framework for designing hardware accelerators for numerical computation on reconfigurable platforms.  ...  ACKNOWLEDGMENTS This work is supported in part by grants from the US Defense Advanced Research Projects Agency W911NF-05-1-0248 and the US National Science Foundation CAREER 0093085.  ... 
doi:10.1109/tc.2009.78 fatcat:zklqp4ljhngc7jostzhlv2jgpq

Survey on Multigrained Reconfigurable Architecture using Parallel Mapping Method

T. Siva Sankara Phani, B. Ananda Krishna, Ranjan K. Senapati
2017 Indian Journal of Science and Technology  
A new folding tree algorithm is proposed (MGRA) with CRGA is proposed to eliminate PE's.  ...  Findings: For better execution characteristics of parallel mapping on MGRA, more PE utilisation rate and less memory access overhead are considered as resulting conditions.  ...  Introduction Spatial computing could be a method that usually uses significant amount of simple parallel processing factors, that which operate at a time, to execute a one application or application kernel  ... 
doi:10.17485/ijst/2017/v10i6/110837 fatcat:eseoaf2g2vbrfajebekoz5frl4

TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

Giovanni Agosta, Daniele Cattaneo, William Fornaciari, Andrea Galimberti, Giuseppe Massari, Federico Reghenzani, Federico Terraneo, Davide Zoni, Carlo Brandolese, Massimo Celino, Francesco Iannone, Paolo Palazzari (+39 others)
2021 2021 24th Euromicro Conference on Digital System Design (DSD)  
Acknowledgements This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative.  ...  The Reverse Time Migration application and mini-kernels are used within EPI to co-design the STX Accelerator and have been ported to FPGAs within the EuroEXA project.  ...  (GPUs and FPGAs) by focusing on data/stream locality, efficient algorithms and programming models, tuned libraries and innovative IPs; 3) seamless integration of reconfigurable accelerators by extending  ... 
doi:10.1109/dsd53832.2021.00051 fatcat:tvsivkak5vgphc35ie5ow7kip4

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

Konstantinos Krommydas, Wu-chun Feng, Christos D. Antonopoulos, Nikolaos Bellas
2015 Journal of Signal Processing Systems  
Using OpenDwarfs, we characterize a diverse set of modern fixed and reconfigurable parallel platforms: multicore CPUs, discrete and integrated GPUs, Intel Xeon Phi co-processor, as well as a FPGA.  ...  Furthermore, we desire a common programming model for the benchmarks that facilitates code portability across a wide variety of different processors (e.g., CPU, APU, GPU, FPGA, DSP) and computing environments  ...  An FPGA implementation with a single pair of accelerators (one accelerator for each OpenCL kernel) offers performance worse even than that of the singlethreaded Opteron 6272 execution (FPGA C1).  ... 
doi:10.1007/s11265-015-1051-z fatcat:ifnbayv26zdttgeovidgjqtoue

K-loops: Loop skewing for Reconfigurable Architectures

Ozana Silvia Dragomir, Koen Bertels
2009 2009 International Conference on Field-Programmable Technology  
In this paper, we propose new techniques for improving the performance of applications running on a reconfigurable platform supporting the Molen programming paradigm.  ...  The first technique presented in this paper improves the application performance by running in parallel on the reconfigurable hardware multiple instances of the kernel.  ...  The contributions of this paper are: a) a technique for parallelizing K-loops with wavefront-like dependencies, running all kernel instances on the reconfigurable hardware; b) a technique for parallelizing  ... 
doi:10.1109/fpt.2009.5377656 fatcat:jgo6gmucmveafasktpgjlnxbry

OpenRCL: Low-Power High-Performance Computing with Reconfigurable Devices

Mingjie Lin, Ilia Lebedev, John Wawrzynek
2010 2010 International Conference on Field Programmable Logic and Applications  
The key idea is to expose the FPGA platform as a compiler target for applications expressed in the OpenCL paradigm.  ...  For the well-known Parallel Prefix Sum (Scan) problem, comparing the runtime of the same problem on a GeForce 9400m using the OpenCL SDK from Apple Inc., the OpenRCL machine demonstrates comparable performance  ...  The key objective is to make a large body of existing and new parallel applications available to FPGA acceleration without significant recoding.  ... 
doi:10.1109/fpl.2010.93 dblp:conf/fpl/LinLW10 fatcat:2gqc62hvpbe5jczac44zrtgwum

Data-aware process networks

Christophe Alias, Alexandru Plesco
2021 Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction  
With the emergence of reconfigurable FPGA circuits as a credible alternative to GPUs for HPC acceleration, new compilation paradigms are required to map high-level algorithmic descriptions to a circuit  ...  DPN combines the benefits of a low-level dataflow representation -close to the final circuit -and affine iteration space tiling to explore the parallelization trade-offs (local memory size, communication  ...  Recently, reconfigurable FPGA circuits [8] have appeared to be a competitive alternative to GPU [46] in the race for energy efficiency.  ... 
doi:10.1145/3446804.3446847 fatcat:pyhil53nuzg2hk2dc7pbj7zh6q

Hardware Compilation of Deep Neural Networks: An Overview

Ruizhe Zhao, Shuanglong Liu, Ho-Cheung Ng, Erwei Wang, James J. Davis, Xinyu Niu, Xiwei Wang, Huifeng Shi, George A. Constantinides, Peter Y. K. Cheung, Wayne Luk
2018 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Deploying a deep neural network model on a reconfigurable platform, such as an FPGA, is challenging due to the enormous design spaces of both network models and hardware design.  ...  Design templates for neural network accelerators are studied with a specific focus on their derivation methodologies.  ...  s design has a tunable folding parameter K for each two-dimensional FFT kernel, while parameters T i and T k enabled configurable parallelism.  ... 
doi:10.1109/asap.2018.8445088 dblp:conf/asap/ZhaoLNWDNWSCCL18 fatcat:v5txrrsfifa6bah2oksjdlrsgi

Automatic compilation to a coarse-grained reconfigurable system-opn-chip

Girish Venkataramani, Walid Najjar, Fadi Kurdahi, Nader Bagherzadeh, Wim Bohm, Jeff Hammes
2003 ACM Transactions on Embedded Computing Systems  
The Morphosys project proposes an SoC architecture consisting of reconfigurable hardware that supports a data-parallel, SIMD computational model.  ...  The rapid growth of device densities on silicon has made it feasible to deploy reconfigurable hardware as a highly parallel computing platform.  ...  FPGAs have the potential for a very large degree of parallelism as compared to traditional processors.  ... 
doi:10.1145/950162.950167 fatcat:atgwub4vmnfmtekpsaxiot77ju

Mapping a data-flow programming model onto heterogeneous platforms

Alina Sbîrlea, Yi Zou, Zoran Budimlíc, Jason Cong, Vivek Sarkar
2012 Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12  
usage of 0.52× of the power used by CPUs alone, when using accelerators (GPUs and FPGAs) and CPUs.  ...  We demonstrate a working example that maps a pipeline of medical image-processing algorithms onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs.  ...  Additional thanks to the Habanero team for their comments and feedback on this work.  ... 
doi:10.1145/2248418.2248428 dblp:conf/lctrts/SbirleaZBCS12 fatcat:pt3s2jlcibehho65hstsw65ahm


X. Shi
2017 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation  ...  Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure  ...  ACKNOWLEDGEMENT This research was partially supported by the National Science Foundation (NSF) through NSF SMA-1416509.  ... 
doi:10.5194/isprs-annals-iv-4-w2-115-2017 fatcat:3iij6pxybjbxno2524jbhbbea4

Performance and energy footprint assessment of FPGAs and GPUs on HPC systems using Astrophysics application [article]

David Goz, Georgios Ieronymakis, Vassilis Papaefstathiou, Nikolaos Dimou, Sara Bertocco, Francesco Simula, Antonio Ragagnin, Luca Tornatore, Igor Coretti, Giuliano Taffoni
2020 arXiv   pre-print
New challenges in Astronomy and Astrophysics (AA) are urging the need for a large number of exceptionally computationally intensive simulations.  ...  Our experience reveals that considering FPGAs for computationally intensive application seems very promising, as their performance is improving to meet the requirements of scientific applications.  ...  We thank Piero Vicini and the INFN APE Roma Group for the support and for the use of INFN computational infrastructure.  ... 
arXiv:2003.03283v2 fatcat:cgsagyvimbhd3pu3cv37q2hfyu

A fully pipelined kernel normalised least mean squares processor for accelerated parameter optimisation

Nicholas J. Fraser, Duncan J.M. Moss, JunKyu Lee, Stephen Tridgell, Craig T. Jin, Philip H.W. Leong
2015 2015 25th International Conference on Field Programmable Logic and Applications (FPL)  
In this paper, we propose the first fully pipelined floating point implementation of the kernel normalised least mean squares algorithm for regression.  ...  KAFs are members of a family of kernel methods which apply an implicit nonlinear mapping of input data to a high dimensional feature space, permitting learning algorithms to be expressed entirely as inner  ...  ACKNOWLEDGMENT This research was supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034).  ... 
doi:10.1109/fpl.2015.7293952 dblp:conf/fpl/FraserMLTJL15 fatcat:tr4g4mfgwzhydckeupksctxcae

Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale

Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, Jason Cong
2016 Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16  
In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators.  ...  A straightforward JNI (Java Native Interface) integration of FPGA accelerators can diminish or even degrade the overall performance (up to 1000X slowdown) due to the overwhelming JVM-to-native-to-FPGA  ...  Acknowledgments This work is partially supported by the Center for Domain-Specific Computing under the NSF InTrans Award CCF-1436827, funding from CDSC industrial partners including Baidu, Fujitsu Labs  ... 
doi:10.1145/2987550.2987569 pmid:28317049 pmcid:PMC5351886 dblp:conf/cloud/HuangWYFICC16 fatcat:5f6bnm6xxbfk3k5fv3sgqarftu
« Previous Showing results 1 — 15 out of 340 results