A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs
2015
Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY 2015
Such applications could include adding functionality to existing projects, such as compiling R [12] , or for implementing solutions for other domain-specific languages which have an array-based core. ...
Compiler writers can simply generate VRIR for key parts of their input programs, and then use Velociraptor to automatically generate CPU/GPU code for their target architecture. ...
An existing LLC with a mature CPU code generator may choose to utilize Velociraptor for only GPU code generation whereas a new LLC may choose to offload as much work as possible to Velociraptor, including ...
doi:10.1145/2774959.2774967
dblp:conf/pldi/GargJH15
fatcat:jtwblrzqdfcornc7bzd5ah42q4
Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units
2013
Communications in Computational Physics
The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows. ...
In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code ...
However, there is some loss of performance in the automatically generated code for certain applications. ...
doi:10.4208/cicp.351011.260112s
fatcat:v36zjc5qhbafbozjsc65aw7scy
Evaluation of a Feature Tracking Vision Application on a Heterogeneous Chip
2014
2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing
CPU and the GPU of an Intel i7, respectively. ...
In this paper, we discuss the optimization of a feature tracking application, written in OpenCL, when running on an on-chip heterogeneous platform. ...
We also thank Mert Dikmen for the help on the early stages of this work. ...
doi:10.1109/sbac-pad.2014.45
dblp:conf/sbac-pad/GranSTG14
fatcat:4hsfgilfb5fynaupazelxjlpcy
Early evaluation of directive-based GPU programming models for productive exascale computing
2012
2012 International Conference for High Performance Computing, Networking, Storage and Analysis
Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes. ...
To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models. ...
This research is sponsored in part by the Office of Advanced Scientific Computing Research in the U.S. Department of Energy and National Science Foundation award OCI-0910735. ...
doi:10.1109/sc.2012.51
dblp:conf/sc/LeeV12
fatcat:dnnmn5dpwnejnn2buf2qulznvy
Benchmarking Data and Compute Intensive Applications on Modern CPU and GPU Architectures
2012
Procedia Computer Science
Thus, the main aim of this paper is to present a comprehensive real performance analysis of selected applications following the complex standard for data compression and coding -JPEG 2000. ...
the CPUs in recent years. ...
Generally, the EBCOT code-block coding is composed of bit-plane coder and arithmetic coder. ...
doi:10.1016/j.procs.2012.04.208
fatcat:bhoeb4zhefeodh7ubazo7xvmny
SESH Framework: A Space Exploration Framework for GPU Application and Hardware Codesign
[chapter]
2014
Lecture Notes in Computer Science
In this paper, we propose SESH framework, a model-driven codesign framework for GPU, that is able to automatically search the design space by simultaneously exploring prospective application and hardware ...
Graphics processing units (GPUs) have become increasingly popular accelerators in supercomputers, and this trend is likely to continue. ...
Even codes for earlier GPU generations may have to be recoded in order to fully exploit new GPU architectures. ...
doi:10.1007/978-3-319-10214-6_9
fatcat:4k5wklfkbjdg5c3yjz7pkpzusi
AAP4All: An Adaptive Auto Parallelization of Serial Code for HPC Systems
2021
Intelligent Automation and Soft Computing
A key advantage of proposed tool is an auto recognition of computer system architecture, then translate automatically the input serial C++ code into parallel programming code for that particular detected ...
However, to address these obstacles and achieve massive performance under power consumption limitations, we propose an Adaptive and Automatic Parallel programming tool (AAP4All) for both homogeneous and ...
Acknowledgement: The authors, acknowledge with thanks DSR King Abdulaziz University, Jeddah, Saudi Arabia for technical and financial support. ...
doi:10.32604/iasc.2021.019044
fatcat:quu46dwokbf2fkq24jrtp7uo7i
Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing
2017
Journal of Computing Science and Engineering
In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. ...
Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. ...
ACKNOWLEDGMENTS This study was funded in part by the NSF (Grant No. CNS 1421577). ...
doi:10.5626/jcse.2017.11.2.69
fatcat:pudydgokind6dpxyk6hdsssgmm
Collaborative Computing for Heterogeneous Integrated Systems
2017
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering - ICPE '17
Computing systems today typically employ, in addition to powerful CPUs, various types of specialized devices such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs). ...
Compared to traditional use of GPUs and FPGAs as offload accelerators, this tight integration enables close collaboration between processors and devices, which is important for better utilization of system ...
Research (C-FAR), the Huawei Project (YB2015120003), and the IBM Center for Cognitive Com-puting Systems Research Center at UIUC. ...
doi:10.1145/3030207.3030244
dblp:conf/wosp/ChangGHHCH17
fatcat:vzwb3h5dlva6xnwpxytfqlcq5u
Efficient Mapping of Irregular C++ Applications to Integrated GPUs
2014
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
There is growing interest in using GPUs to accelerate generalpurpose computation since they offer the potential of massive parallelism with reduced energy consumption. ...
Using Concord, we ran nine irregular C++ programs on two computer systems containing Intel 4 th Generation Core processors. ...
[14] developed an automatic code transformation system that generates parallel CUDA code for regular programs from sequential C code. ...
doi:10.1145/2544137.2544165
fatcat:bjpwxwclfbeoflbz6c5d3f6fnq
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
2013
ACM Transactions on Architecture and Code Optimization (TACO)
We also show how to trade accuracy for energy at runtime. Overall, our application can perform accurate object detection at 40 frames per second (fps) rate, in an energy efficient manner. ...
Dios from University of Malaga (Spain) for their support in the beginning of this project. ...
Checking the generated assembly code of orig-auto, we see that the compiler has generated vector code but it is not as efficient as our manual code since it could not perform our transformation automatically ...
doi:10.1145/2541228.2555302
fatcat:yhmbb4mnkfc2bd5jc5ojhzoizm
Fermilab multicore and GPU-accelerated clusters for lattice QCD
2012
Journal of Physics, Conference Series
In the last several years, GPU acceleration has led to further decreases in price/performance for ported applications. ...
We discuss the design and performance of a GPU-accelerated cluster that Fermilab deployed in January 2012. ...
The Fermi National Accelerator Laboratory is operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy. ...
doi:10.1088/1742-6596/396/4/042029
fatcat:3otuoh5q3nctzhat5pov32r7rq
Speeding up the MATLAB complex networks package using graphic processors
2011
Chinese Physics B
The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). ...
The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. ...
Any function called on GPU data will be executed on the GPU automatically without any extra programming. ...
doi:10.1088/1674-1056/20/9/098901
fatcat:ypbvfcccqvbfhe65uakibjn37e
The SuperCodelet architecture
2022
Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions
This architecture takes advantage of instruction level parallelism techniques based on dataflow analysis, allowing implicit parallel execution of code. We present the SuperCodelet Architecture. ...
OpenMP is only used to generate code for the GPU and schedule it, but not to speed up computation outside of the Codelet. Intel's OneAPI compiler is used to generate code for the Intel Gen9 GPU. ...
The LLC is shared between the CPU and the GPU, allowing for access to share values between CPU and GPU. ...
doi:10.1145/3529336.3530823
fatcat:3ev3elbyknhr5lxjb4txulqtci
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction
2010
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
Communication and synchronizations operations at multiple levels are generated automatically. The resulting mapping is currently emitted in the CUDA programming language. ...
improvements in the price/performance and energy/performance over general purpose processors. ...
From this point, the code is compiled by a "low-level compiler" (LLC), which performs relatively conventional steps of scalar code generation for the host and PEs. ...
doi:10.1145/1735688.1735698
dblp:conf/asplos/LeungVMBWBL10
fatcat:5vnudjbr6vae5mwktixizhywpy
« Previous
Showing results 1 — 15 out of 1,027 results