A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
A systems perspective on GPU computing
2016
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16
To this end, his contributions include novel scheduling and resource management abstractions, runtime specialization, and novel data management techniques to support scalable, distributed GPU frameworks ...
His vision encompassed the conceptualization, implementation, and demonstration of systems abstractions and runtime methods to elevate GPUs into first-class citizens in today's and future heterogeneous ...
Section 3 surveys runtime specialization methods for heterogeneous GPU platforms, followed with Section 4, which provides an overview of novel data management techniques to support distributed GPU runtimes ...
doi:10.1145/2884045.2884057
dblp:conf/ppopp/Farooqui16
fatcat:lcxhf6nfsvannnbp5lusxudmmu
D4.1 Programming Language And Runtime System: Requirements
2016
Zenodo
The VINEYARD projects aims to achieve easy-to-use and transparent acceleration of data analytics. ...
One of the components in the VINEYARD is the programming model and runtime system support, which is developed in Work Package 4. ...
Data analytics platforms typically specialize on a specific set of workloads, e.g. batch processing, stream processing or graph analytics. ...
doi:10.5281/zenodo.898162
fatcat:h4qoibk26vfzdao5badtj6fdie
Big Data Pilot Demo Days: I-BiDaaS Application to the Financial Sector Webinar
2020
Zenodo
The kick-off webinar of the Big Data Pilot Demo Days series under BDV PPP, I-BiDaaS Application to the Financial Sector, was held on May 21st. ...
The main goal of this big data pilot webinar was to demonstrate in a step by step fashion the I-BiDaaS self-service solution and its application to the banking sector. ...
analytics; Synergy of CEP and GPU-
accelerated analytics for streaming data
Feedback from analytics to data fabrication
Feedback from analytics to problem modelling
Demonstrated on use cases across ...
doi:10.5281/zenodo.3865193
fatcat:x2umpjbxkvbmteiglbynsowbhe
An investigation of GPU-based stiff chemical kinetics integration methods
2017
Combustion and Flame
Finally, future research directions for working towards enabling realistic chemistry in reactive-flow simulations via GPU\slash SIMD accelerated stiff chemical kinetic integration were identified. ...
of 7.11-240.96 times; in comparison, the corresponding slowdowns on the CPU were just 1.39-2.61 times, underscoring the importance of use of an analytical Jacobian for efficient GPU integration. ...
In this work, our efforts to accelerate simulations with chemical kinetics focus on improving the integration strategy itself, via development of new algorithms and using high-performance hardware accelerators ...
doi:10.1016/j.combustflame.2017.02.005
fatcat:fvm45pr4qngs5leth2wyqztsqe
Guest Editorial: Special Issue on Computing Frontiers
2018
International journal of parallel programming
This special issue collects extended versions of the best papers of the 2016 edition of the ACM International Conference on Computing Frontiers. ...
regular papers accepted at the conference (out of 96 total submissions) we have invited authors of the seven most representative, in terms of quality and topics, to submit extended versions for this special ...
The first article, "Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization" by Naila Farooqui, Indrajit Ro, Yuan Chen, Vanish Talwar, Rajkishore Barik, Brian Lewis, Tatiana ...
doi:10.1007/s10766-018-0556-z
fatcat:wprr6jwxsvganasr3ynat5yz3a
SAVE: Towards Efficient Resource Management in Heterogeneous System Architectures
[chapter]
2014
Lecture Notes in Computer Science
The SAVE project will develop HW/SW/OS components that allow for deciding at runtime the mapping of the computation kernels on the appropriate type of resource, based on the current system context and ...
Self-adaptiveness and hardware-assisted virtualization are the two key-enabling technologies for this kind of architectures, to allow the efficient exploitation of the available resources based on the ...
for scientific computing or big data analytics. ...
doi:10.1007/978-3-319-05960-0_38
fatcat:txyxyy3oxfgxxlejazrypktzti
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale
2016
Proceedings of the Seventh ACM Symposium on Cloud Computing - SoCC '16
, and iii) share an FPGA platform by multiple accelerators of different functionalities. 3. ...
Unlike conventional CPU and GPU targeted programs, compiling an FPGA program can take several hours, which makes existing runtime systems that use dynamic code generation for CPU-GPU datacenters, such ...
of the six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA; grants NSF IIS-1302698 and CNS-1351047; and U54EB020404 awarded by NIH Big Data to Knowledge ( ...
doi:10.1145/2987550.2987569
pmid:28317049
pmcid:PMC5351886
dblp:conf/cloud/HuangWYFICC16
fatcat:5f6bnm6xxbfk3k5fv3sgqarftu
Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms
2018
Combustion and Flame
Speedups reached up to 17.60x and 45.13x for dense and sparse evaluation on the GPU, and up to 55.11x and 245.63x on the CPU over a first-order finite-difference Jacobian approach. ...
Further, dense Jacobian evaluation was up to 19.56x and 2.84x times faster than a previous version of pyJac on a CPU and GPU, respectively. ...
In addition to parallel OpenMP evaluation on the CPU, this work enabled the shallow-vectorized evaluation of the chemical-kinetic source terms and analytical Jacobian on both the CPU and GPU via OpenCL ...
doi:10.1016/j.combustflame.2018.09.008
fatcat:r526kccqovbbpjqewmjqzzpnjq
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
[article]
2019
arXiv
pre-print
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility ...
Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. ...
Figure 8 shows a performance comparison across these models, comparing VTA-accelerated execution against a highly optimized ARM CPU and GPU platforms that rely on industry-strength deep learning libraries ...
arXiv:1807.04188v3
fatcat:wpafekkrqzffzfe7vulaa6qnva
Parallel Programming Models for Heterogeneous Many-Cores : A Survey
[article]
2020
arXiv
pre-print
We conclude with a discussion on open issues in the area and potential research directions. ...
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. ...
Hand-Crafted Analytical Models Gomez-Luna et al. present performance models for asynchronous data transfers on GPU architectures [98] . ...
arXiv:2005.04094v1
fatcat:e2psrdnyajh3hih3znnjjbezae
Applying graphics processor acceleration in a software defined radio prototyping environment
2011
2011 22nd IEEE International Symposium on Rapid System Prototyping
SDR applications have different levels of parallelism that can be exploited on multicore platforms, but design and programming difficulties have inhibited the adoption of specialized multicore platforms ...
The approach gives an SDR developer the ability to prototype a GPU accelerated application and explore its design space fast and effectively. ...
Fig. 2 . 2 GRGPU: A GNU Radio integration of GPU accelerated actors. ...
doi:10.1109/rsp.2011.5929977
dblp:conf/rsp/PlishkerZBCK11
fatcat:6twg6hnzyveszawlsbi6amgmz4
Importance of Some Specifications of Heterogeneous Architectures (CPU+GPU) for 3D Cone-Beam-CT Image Reconstruction Using OpenCL
2021
International Journal of Biology and Biomedical Engineering
For this reason, the use of acceleration methods on GPU becomes a real solution. For the acceleration of the FDK algorithm, we have used the GPU on heterogeneous platforms. ...
We have found that the number of parallel cores, as well as the memory bandwidth, have no effect on runtimes speedup without being rough in the choice of the number of work-items, which represents a real ...
In the proposed work, we accelerate reconstruction by an analytical method by using heterogeneous architectures (CPU+GPU platforms) using OpenCL with C++. ...
doi:10.46300/91011.2021.15.33
fatcat:yqcrwdyq2bgrpkv57dsf2qpgim
Energy-efficient computing with heterogeneous multi-cores
2014
2014 International Symposium on Integrated Circuits (ISIC)
complex micro-architecture) co-exist on the same die. ...
This paper describes heterogeneous multi-core architectures and the runtime management strategies to leverage the potential of such architectures for improved energy-efficiency. ...
Similarly, embedded GPUs are ubiquitous today in mobile platforms to enable not only mobile 3D gaming but also general-purpose computing on GPU for data-parallel (DLP) compute-intensive tasks such as voice ...
doi:10.1109/isicir.2014.7029584
dblp:conf/isicir/Mitra14
fatcat:oqcovqjrdfbd5jo6pdqm3h2y5u
Parallel programming models for heterogeneous many-cores: a comprehensive survey
2020
CCF Transactions on High Performance Computing
We conclude with a discussion on open issues in the area and potential research directions. ...
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. ...
The backend compiler and runtime can automatically manage the data mapping and generate OpenCL/CUDA code for GPUs. ...
doi:10.1007/s42514-020-00039-4
fatcat:nn56xhjm6rcu7kya6gfnyjg66q
Automatically harnessing sparse acceleration
2020
Proceedings of the 29th International Conference on Compiler Construction
Across heterogeneous platforms, applications and data sets we show speedups of 1.1× to over 10× without user intervention. ...
We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. ...
No accelerator library performs well reliably, each harness outperforms any other harness on some combination of data set and platform. ...
doi:10.1145/3377555.3377893
dblp:conf/cc/GinsbachCO20
fatcat:wf6utlth6na7ddronimzit5xzq
« Previous
Showing results 1 — 15 out of 1,905 results