1,027 Hits in 5.5 sec

Velociraptor: a compiler toolkit for array-based languages targeting CPUs and GPUs

Rahul Garg, Sameer Jagdale, Laurie Hendren
2015 Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY 2015  
Such applications could include adding functionality to existing projects, such as compiling R [12] , or for implementing solutions for other domain-specific languages which have an array-based core.  ...  Compiler writers can simply generate VRIR for key parts of their input programs, and then use Velociraptor to automatically generate CPU/GPU code for their target architecture.  ...  An existing LLC with a mature CPU code generator may choose to utilize Velociraptor for only GPU code generation whereas a new LLC may choose to offload as much work as possible to Velociraptor, including  ... 
doi:10.1145/2774959.2774967 dblp:conf/pldi/GargJH15 fatcat:jtwblrzqdfcornc7bzd5ah42q4

Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units

Stuart D. C. Walsh, Martin O. Saar
2013 Communications in Computational Physics  
The performance of the automatically generated code is compared to equivalent purposewritten codes for both single-phase,multiphase, andmulticomponent flows.  ...  In particular, it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code  ...  However, there is some loss of performance in the automatically generated code for certain applications.  ... 
doi:10.4208/cicp.351011.260112s fatcat:v36zjc5qhbafbozjsc65aw7scy

Evaluation of a Feature Tracking Vision Application on a Heterogeneous Chip

Ruben Gran, August Shi, Ehsan Totoni, Maria J. Garzaran
2014 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing  
CPU and the GPU of an Intel i7, respectively.  ...  In this paper, we discuss the optimization of a feature tracking application, written in OpenCL, when running on an on-chip heterogeneous platform.  ...  We also thank Mert Dikmen for the help on the early stages of this work.  ... 
doi:10.1109/sbac-pad.2014.45 dblp:conf/sbac-pad/GranSTG14 fatcat:4hsfgilfb5fynaupazelxjlpcy

Early evaluation of directive-based GPU programming models for productive exascale computing

Seyong Lee, Jeffrey S. Vetter
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
Our evaluation shows that directive-based models can achieve reasonable performance, compared to hand-written GPU codes.  ...  To provide better abstractions for programming GPU architectures, researchers and vendors have proposed several directive-based GPU programming models.  ...  This research is sponsored in part by the Office of Advanced Scientific Computing Research in the U.S. Department of Energy and National Science Foundation award OCI-0910735.  ... 
doi:10.1109/sc.2012.51 dblp:conf/sc/LeeV12 fatcat:dnnmn5dpwnejnn2buf2qulznvy

Benchmarking Data and Compute Intensive Applications on Modern CPU and GPU Architectures

Miłosz Ciżnicki, Michał Kierzynka, Piotr Kopta, Krzysztof Kurowski, Paweł Gepner
2012 Procedia Computer Science  
Thus, the main aim of this paper is to present a comprehensive real performance analysis of selected applications following the complex standard for data compression and coding -JPEG 2000.  ...  the CPUs in recent years.  ...  Generally, the EBCOT code-block coding is composed of bit-plane coder and arithmetic coder.  ... 
doi:10.1016/j.procs.2012.04.208 fatcat:bhoeb4zhefeodh7ubazo7xvmny

SESH Framework: A Space Exploration Framework for GPU Application and Hardware Codesign [chapter]

Joo Hwan Lee, Jiayuan Meng, Hyesoon Kim
2014 Lecture Notes in Computer Science  
In this paper, we propose SESH framework, a model-driven codesign framework for GPU, that is able to automatically search the design space by simultaneously exploring prospective application and hardware  ...  Graphics processing units (GPUs) have become increasingly popular accelerators in supercomputers, and this trend is likely to continue.  ...  Even codes for earlier GPU generations may have to be recoded in order to fully exploit new GPU architectures.  ... 
doi:10.1007/978-3-319-10214-6_9 fatcat:4k5wklfkbjdg5c3yjz7pkpzusi

AAP4All: An Adaptive Auto Parallelization of Serial Code for HPC Systems

M. Usman Ashraf, Fathy Alburaei Eassa, Leon J. Osterweil, Aiiad Ahmad Albeshri, Abdullah Algarni, Iqra Ilyas
2021 Intelligent Automation and Soft Computing  
A key advantage of proposed tool is an auto recognition of computer system architecture, then translate automatically the input serial C++ code into parallel programming code for that particular detected  ...  However, to address these obstacles and achieve massive performance under power consumption limitations, we propose an Adaptive and Automatic Parallel programming tool (AAP4All) for both homogeneous and  ...  Acknowledgement: The authors, acknowledge with thanks DSR King Abdulaziz University, Jeddah, Saudi Arabia for technical and financial support.  ... 
doi:10.32604/iasc.2021.019044 fatcat:quu46dwokbf2fkq24jrtp7uo7i

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

Yijie Huangfu, Wei Zhang
2017 Journal of Computing Science and Engineering  
In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs.  ...  Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality.  ...  ACKNOWLEDGMENTS This study was funded in part by the NSF (Grant No. CNS 1421577).  ... 
doi:10.5626/jcse.2017.11.2.69 fatcat:pudydgokind6dpxyk6hdsssgmm

Collaborative Computing for Heterogeneous Integrated Systems

Li-Wen Chang, Juan Gómez-Luna, Izzat El Hajj, Sitao Huang, Deming Chen, Wen-mei Hwu
2017 Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering - ICPE '17  
Computing systems today typically employ, in addition to powerful CPUs, various types of specialized devices such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs).  ...  Compared to traditional use of GPUs and FPGAs as offload accelerators, this tight integration enables close collaboration between processors and devices, which is important for better utilization of system  ...  Research (C-FAR), the Huawei Project (YB2015120003), and the IBM Center for Cognitive Com-puting Systems Research Center at UIUC.  ... 
doi:10.1145/3030207.3030244 dblp:conf/wosp/ChangGHHCH17 fatcat:vzwb3h5dlva6xnwpxytfqlcq5u

Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Rajkishore Barik, Rashid Kaleem, Deepak Majeti, Brian T. Lewis, Tatiana Shpeisman, Chunling Hu, Yang Ni, Ali-Reza Adl-Tabatabai
2014 Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization  
There is growing interest in using GPUs to accelerate generalpurpose computation since they offer the potential of massive parallelism with reduced energy consumption.  ...  Using Concord, we ran nine irregular C++ programs on two computer systems containing Intel 4 th Generation Core processors.  ...  [14] developed an automatic code transformation system that generates parallel CUDA code for regular programs from sequential C code.  ... 
doi:10.1145/2544137.2544165 fatcat:bjpwxwclfbeoflbz6c5d3f6fnq

Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures

Ehsan Totoni, Mert Dikmen, María Jesús Garzarán
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
We also show how to trade accuracy for energy at runtime. Overall, our application can perform accurate object detection at 40 frames per second (fps) rate, in an energy efficient manner.  ...  Dios from University of Malaga (Spain) for their support in the beginning of this project.  ...  Checking the generated assembly code of orig-auto, we see that the compiler has generated vector code but it is not as efficient as our manual code since it could not perform our transformation automatically  ... 
doi:10.1145/2541228.2555302 fatcat:yhmbb4mnkfc2bd5jc5ojhzoizm

Fermilab multicore and GPU-accelerated clusters for lattice QCD

D Holmgren, N Seenu, J Simone, A Singh
2012 Journal of Physics, Conference Series  
In the last several years, GPU acceleration has led to further decreases in price/performance for ported applications.  ...  We discuss the design and performance of a GPU-accelerated cluster that Fermilab deployed in January 2012.  ...  The Fermi National Accelerator Laboratory is operated by Fermi Research Alliance, LLC under Contract No. De-AC02-07CH11359 with the United States Department of Energy.  ... 
doi:10.1088/1742-6596/396/4/042029 fatcat:3otuoh5q3nctzhat5pov32r7rq

Speeding up the MATLAB complex networks package using graphic processors

Bai-Da Zhang, Yu-Hua Tang, Jun-Jie Wu, Xin Li
2011 Chinese Physics B  
The performance of the MATLAB codes can be further improved by using graphic processor units (GPU).  ...  The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.  ...  Any function called on GPU data will be executed on the GPU automatically without any extra programming.  ... 
doi:10.1088/1674-1056/20/9/098901 fatcat:ypbvfcccqvbfhe65uakibjn37e

The SuperCodelet architecture

Jose M Monsalve Diaz, Kevin Harms, Rafael A. Herrera Guaitero, Diego A. Roa Perdomo, Kalyan Kumaran, Guang R. Gao
2022 Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions  
This architecture takes advantage of instruction level parallelism techniques based on dataflow analysis, allowing implicit parallel execution of code. We present the SuperCodelet Architecture.  ...  OpenMP is only used to generate code for the GPU and schedule it, but not to speed up computation outside of the Codelet. Intel's OneAPI compiler is used to generate code for the Intel Gen9 GPU.  ...  The LLC is shared between the CPU and the GPU, allowing for access to share values between CPU and GPU.  ... 
doi:10.1145/3529336.3530823 fatcat:3ev3elbyknhr5lxjb4txulqtci

A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Allen Leung, Nicolas Vasilache, Benoît Meister, Muthu Baskaran, David Wohlford, Cédric Bastoul, Richard Lethin
2010 Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10  
Communication and synchronizations operations at multiple levels are generated automatically. The resulting mapping is currently emitted in the CUDA programming language.  ...  improvements in the price/performance and energy/performance over general purpose processors.  ...  From this point, the code is compiled by a "low-level compiler" (LLC), which performs relatively conventional steps of scalar code generation for the host and PEs.  ... 
doi:10.1145/1735688.1735698 dblp:conf/asplos/LeungVMBWBL10 fatcat:5vnudjbr6vae5mwktixizhywpy
« Previous Showing results 1 — 15 out of 1,027 results