1,905 Hits in 3.2 sec

A systems perspective on GPU computing

Naila Farooqui
2016 Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16  
To this end, his contributions include novel scheduling and resource management abstractions, runtime specialization, and novel data management techniques to support scalable, distributed GPU frameworks  ...  His vision encompassed the conceptualization, implementation, and demonstration of systems abstractions and runtime methods to elevate GPUs into first-class citizens in today's and future heterogeneous  ...  Section 3 surveys runtime specialization methods for heterogeneous GPU platforms, followed with Section 4, which provides an overview of novel data management techniques to support distributed GPU runtimes  ... 
doi:10.1145/2884045.2884057 dblp:conf/ppopp/Farooqui16 fatcat:lcxhf6nfsvannnbp5lusxudmmu

D4.1 Programming Language And Runtime System: Requirements

Hans Vandierendonck
2016 Zenodo  
The VINEYARD projects aims to achieve easy-to-use and transparent acceleration of data analytics.  ...  One of the components in the VINEYARD is the programming model and runtime system support, which is developed in Work Package 4.  ...  Data analytics platforms typically specialize on a specific set of workloads, e.g. batch processing, stream processing or graph analytics.  ... 
doi:10.5281/zenodo.898162 fatcat:h4qoibk26vfzdao5badtj6fdie

Big Data Pilot Demo Days: I-BiDaaS Application to the Financial Sector Webinar

Dusan Jakovetic, Ramon Martin De Pozuelo
2020 Zenodo  
The kick-off webinar of the Big Data Pilot Demo Days series under BDV PPP, I-BiDaaS Application to the Financial Sector, was held on May 21st.  ...  The main goal of this big data pilot webinar was to demonstrate in a step by step fashion the I-BiDaaS self-service solution and its application to the banking sector.  ...  analytics; Synergy of CEP and GPU- accelerated analytics for streaming data Feedback from analytics to data fabrication Feedback from analytics to problem modelling Demonstrated on use cases across  ... 
doi:10.5281/zenodo.3865193 fatcat:x2umpjbxkvbmteiglbynsowbhe

An investigation of GPU-based stiff chemical kinetics integration methods

Nicholas J. Curtis, Kyle E. Niemeyer, Chih-Jen Sung
2017 Combustion and Flame  
Finally, future research directions for working towards enabling realistic chemistry in reactive-flow simulations via GPU\slash SIMD accelerated stiff chemical kinetic integration were identified.  ...  of 7.11-240.96 times; in comparison, the corresponding slowdowns on the CPU were just 1.39-2.61 times, underscoring the importance of use of an analytical Jacobian for efficient GPU integration.  ...  In this work, our efforts to accelerate simulations with chemical kinetics focus on improving the integration strategy itself, via development of new algorithms and using high-performance hardware accelerators  ... 
doi:10.1016/j.combustflame.2017.02.005 fatcat:fvm45pr4qngs5leth2wyqztsqe

Guest Editorial: Special Issue on Computing Frontiers

Antonino Tumeo, Hubertus Franke, Gianluca Palermo, John Feo
2018 International journal of parallel programming  
This special issue collects extended versions of the best papers of the 2016 edition of the ACM International Conference on Computing Frontiers.  ...  regular papers accepted at the conference (out of 96 total submissions) we have invited authors of the seven most representative, in terms of quality and topics, to submit extended versions for this special  ...  The first article, "Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization" by Naila Farooqui, Indrajit Ro, Yuan Chen, Vanish Talwar, Rajkishore Barik, Brian Lewis, Tatiana  ... 
doi:10.1007/s10766-018-0556-z fatcat:wprr6jwxsvganasr3ynat5yz3a

SAVE: Towards Efficient Resource Management in Heterogeneous System Architectures [chapter]

G. Durelli, M. Coppola, K. Djafarian, G. Kornaros, A. Miele, M. Paolino, Oliver Pell, Christian Plessl, M. D. Santambrogio, C. Bolchini
2014 Lecture Notes in Computer Science  
The SAVE project will develop HW/SW/OS components that allow for deciding at runtime the mapping of the computation kernels on the appropriate type of resource, based on the current system context and  ...  Self-adaptiveness and hardware-assisted virtualization are the two key-enabling technologies for this kind of architectures, to allow the efficient exploitation of the available resources based on the  ...  for scientific computing or big data analytics.  ... 
doi:10.1007/978-3-319-05960-0_38 fatcat:txyxyy3oxfgxxlejazrypktzti

Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale

Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, Jason Cong
2016 Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16  
, and iii) share an FPGA platform by multiple accelerators of different functionalities. 3.  ...  Unlike conventional CPU and GPU targeted programs, compiling an FPGA program can take several hours, which makes existing runtime systems that use dynamic code generation for CPU-GPU datacenters, such  ...  of the six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA; grants NSF IIS-1302698 and CNS-1351047; and U54EB020404 awarded by NIH Big Data to Knowledge (  ... 
doi:10.1145/2987550.2987569 pmid:28317049 pmcid:PMC5351886 dblp:conf/cloud/HuangWYFICC16 fatcat:5f6bnm6xxbfk3k5fv3sgqarftu

Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms

Nicholas J. Curtis, Kyle E. Niemeyer, Chih-Jen Sung
2018 Combustion and Flame  
Speedups reached up to 17.60x and 45.13x for dense and sparse evaluation on the GPU, and up to 55.11x and 245.63x on the CPU over a first-order finite-difference Jacobian approach.  ...  Further, dense Jacobian evaluation was up to 19.56x and 2.84x times faster than a previous version of pyJac on a CPU and GPU, respectively.  ...  In addition to parallel OpenMP evaluation on the CPU, this work enabled the shallow-vectorized evaluation of the chemical-kinetic source terms and analytical Jacobian on both the CPU and GPU via OpenCL  ... 
doi:10.1016/j.combustflame.2018.09.008 fatcat:r526kccqovbbpjqewmjqzzpnjq

A Hardware-Software Blueprint for Flexible Deep Learning Specialization [article]

Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
2019 arXiv   pre-print
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility  ...  Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators.  ...  Figure 8 shows a performance comparison across these models, comparing VTA-accelerated execution against a highly optimized ARM CPU and GPU platforms that rely on industry-strength deep learning libraries  ... 
arXiv:1807.04188v3 fatcat:wpafekkrqzffzfe7vulaa6qnva

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
We conclude with a discussion on open issues in the area and potential research directions.  ...  Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers.  ...  Hand-Crafted Analytical Models Gomez-Luna et al. present performance models for asynchronous data transfers on GPU architectures [98] .  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae

Applying graphics processor acceleration in a software defined radio prototyping environment

William Plishker, George F. Zaki, Shuvra S. Bhattacharyya, Charles Clancy, John Kuykendall
2011 2011 22nd IEEE International Symposium on Rapid System Prototyping  
SDR applications have different levels of parallelism that can be exploited on multicore platforms, but design and programming difficulties have inhibited the adoption of specialized multicore platforms  ...  The approach gives an SDR developer the ability to prototype a GPU accelerated application and explore its design space fast and effectively.  ...  Fig. 2 . 2 GRGPU: A GNU Radio integration of GPU accelerated actors.  ... 
doi:10.1109/rsp.2011.5929977 dblp:conf/rsp/PlishkerZBCK11 fatcat:6twg6hnzyveszawlsbi6amgmz4

Importance of Some Specifications of Heterogeneous Architectures (CPU+GPU) for 3D Cone-Beam-CT Image Reconstruction Using OpenCL

T. Nouioua, A. H. Belbachir
2021 International Journal of Biology and Biomedical Engineering  
For this reason, the use of acceleration methods on GPU becomes a real solution. For the acceleration of the FDK algorithm, we have used the GPU on heterogeneous platforms.  ...  We have found that the number of parallel cores, as well as the memory bandwidth, have no effect on runtimes speedup without being rough in the choice of the number of work-items, which represents a real  ...  In the proposed work, we accelerate reconstruction by an analytical method by using heterogeneous architectures (CPU+GPU platforms) using OpenCL with C++.  ... 
doi:10.46300/91011.2021.15.33 fatcat:yqcrwdyq2bgrpkv57dsf2qpgim

Energy-efficient computing with heterogeneous multi-cores

Tulika Mitra
2014 2014 International Symposium on Integrated Circuits (ISIC)  
complex micro-architecture) co-exist on the same die.  ...  This paper describes heterogeneous multi-core architectures and the runtime management strategies to leverage the potential of such architectures for improved energy-efficiency.  ...  Similarly, embedded GPUs are ubiquitous today in mobile platforms to enable not only mobile 3D gaming but also general-purpose computing on GPU for data-parallel (DLP) compute-intensive tasks such as voice  ... 
doi:10.1109/isicir.2014.7029584 dblp:conf/isicir/Mitra14 fatcat:oqcovqjrdfbd5jo6pdqm3h2y5u

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
We conclude with a discussion on open issues in the area and potential research directions.  ...  Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers.  ...  The backend compiler and runtime can automatically manage the data mapping and generate OpenCL/CUDA code for GPUs.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

Automatically harnessing sparse acceleration

Philip Ginsbach, Bruce Collie, Michael F. P. O'Boyle
2020 Proceedings of the 29th International Conference on Compiler Construction  
Across heterogeneous platforms, applications and data sets we show speedups of 1.1× to over 10× without user intervention.  ...  We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels.  ...  No accelerator library performs well reliably, each harness outperforms any other harness on some combination of data set and platform.  ... 
doi:10.1145/3377555.3377893 dblp:conf/cc/GinsbachCO20 fatcat:wf6utlth6na7ddronimzit5xzq
« Previous Showing results 1 — 15 out of 1,905 results