Filters








6,830 Hits in 4.7 sec

Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Rajkishore Barik, Rashid Kaleem, Deepak Majeti, Brian T. Lewis, Tatiana Shpeisman, Chunling Hu, Yang Ni, Ali-Reza Adl-Tabatabai
2014 Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization  
It also includes compiler optimizations to improve irregular application performance on GPUs.  ...  However, while the majority of effort has focused on GPU acceleration of regular applications, relatively little is known about the behavior of irregular applications on GPUs.  ...  Using nine realistic irregular C++ applications, we demonstrate that C++ applications using pointers and other object-oriented features can be automatically mapped to the GPU.  ... 
doi:10.1145/2544137.2544165 fatcat:bjpwxwclfbeoflbz6c5d3f6fnq

On-the-fly elimination of dynamic irregularities for GPU computing

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, Xipeng Shen
2012 SIGPLAN notices  
Its optimization overhead is largely transparent to GPU kernel executions, jeopardizing no basic efficiency of the GPU application.  ...  Finally, it is robust to the presence of various complexities in GPU applications.  ...  Acknowledgments We thank Mary Hall for her help during the preparation of the final version of the paper.  ... 
doi:10.1145/2248487.1950408 fatcat:7vkz2g5uxvcp3l76amerkmsrii

On-the-fly elimination of dynamic irregularities for GPU computing

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, Xipeng Shen
2011 SIGPLAN notices  
Its optimization overhead is largely transparent to GPU kernel executions, jeopardizing no basic efficiency of the GPU application.  ...  Finally, it is robust to the presence of various complexities in GPU applications.  ...  Acknowledgments We thank Mary Hall for her help during the preparation of the final version of the paper.  ... 
doi:10.1145/1961296.1950408 fatcat:tewgvaelsvboxaiv74opy55c34

On-the-fly elimination of dynamic irregularities for GPU computing

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, Xipeng Shen
2011 SIGARCH Computer Architecture News  
Its optimization overhead is largely transparent to GPU kernel executions, jeopardizing no basic efficiency of the GPU application.  ...  Finally, it is robust to the presence of various complexities in GPU applications.  ...  Acknowledgments We thank Mary Hall for her help during the preparation of the final version of the paper.  ... 
doi:10.1145/1961295.1950408 fatcat:yigrclg2ebcyveb2hydo7wiacq

On-the-fly elimination of dynamic irregularities for GPU computing

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, Xipeng Shen
2011 Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11  
Its optimization overhead is largely transparent to GPU kernel executions, jeopardizing no basic efficiency of the GPU application.  ...  Finally, it is robust to the presence of various complexities in GPU applications.  ...  Acknowledgments We thank Mary Hall for her help during the preparation of the final version of the paper.  ... 
doi:10.1145/1950365.1950408 dblp:conf/asplos/ZhangJGTS11 fatcat:eikdeewbynhxbffsx5hpp437x4

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, Changkyu Kim
2012 IEEE Micro  
Specialization is a promising direction for improving processor energy efficiency. With functionality specialization, hardware is designed for application-specific units of computation.  ...  Our full-system FPGA prototype of DySER integrated into OpenSPARC demonstrates an implementation is practical.  ...  Irregular memory access due to histograming. Causes branch divergence. Performs histogram with reductions. Similar b/c GPU parallelizes hist.  ... 
doi:10.1109/mm.2012.51 fatcat:vhuwzkylqzh7bhwyecd7k2bree

A systems perspective on GPU computing

Naila Farooqui
2016 Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16  
His vision encompassed the conceptualization, implementation, and demonstration of systems abstractions and runtime methods to elevate GPUs into first-class citizens in today's and future heterogeneous  ...  In this paper, we summarize his legacy of key research contributions in general-purpose GPU computing.  ...  Acknowledgments We would like to thank Professor Sudhakar Yalamanchili, Ada Gavrilovska, Vishakha Gupta, Sudarsun Kannan, Alexander Merritt, and Dipanjan Sengupta for their feedback and assistance with  ... 
doi:10.1145/2884045.2884057 dblp:conf/ppopp/Farooqui16 fatcat:lcxhf6nfsvannnbp5lusxudmmu

Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads [article]

Albert Segura, Jose-Maria Arnau, Antonio Gonzalez
2022 arXiv   pre-print
However, irregular applications struggle to fully realize GPGPU performance as a result of control flow divergence and memory divergence due to irregular memory access patterns.  ...  Additionally, the IRU is capable of filtering and merging duplicated irregular access which further improves graph-based irregular applications.  ...  Acknowledgment This work has been supported by the CoCoUnit ERC Advanced Grant of the EU's Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00  ... 
arXiv:2007.07131v2 fatcat:zfh7mpzeanbs3msbfsapfn37ca

Affinity-aware work-stealing for integrated CPU-GPU processors

Naila Farooqui, Rajkishore Barik, Brian T. Lewis, Tatiana Shpeisman, Karsten Schwan
2016 Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '16  
This paper describes a preliminary implementation of our work-stealing scheduler, Libra, which includes techniques to deal with these architectural differences in integrated CPU-GPU processors.  ...  We show preliminary results using a diverse set of nine regular and irregular workloads running on an Intel Broadwell Core-M processor.  ...  Our Approach The goal of our Libra work-stealing runtime is to efficiently and dynamically balance data-parallel computation across the cores of CPU and GPU.  ... 
doi:10.1145/2851141.2851194 dblp:conf/ppopp/FarooquiBLSS16 fatcat:orske5g5e5ayzehyhiqmqmk6ii

Strategies for Efficient Executions of Irregular Message-Driven Parallel Applications on GPU Systems [article]

Vasudevan Rengasamy, Sathish Vadhiyar
2020 arXiv   pre-print
Supporting efficient execution of such message-driven irregular applications on GPU systems presents a number of challenges related to irregular data accesses and computations.  ...  We have integrated these runtime strategies into our G-Charm framework for efficient execution of message-driven parallel applications on hybrid GPU systems.  ...  Supporting efficient executions of messagedriven irregular applications on GPU systems presents a number of challenges to our G-Charm runtime system.  ... 
arXiv:2008.05712v1 fatcat:ja7z6jtmobhodnivh4d4focfma

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration [article]

Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo
2020 arXiv   pre-print
The key to SMA is the temporal integration of the systolic execution model with the GPU-like SIMD execution model.  ...  Integrating a general-purpose processor such as a CPU or a GPU incurs significant data movement overhead and leads to resource under-utilization on the DNN accelerators.  ...  Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.  ... 
arXiv:2002.08326v2 fatcat:3asj3sqruncz7czbtkxbookt2u

Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework

Vlad Slavici, Raghu Varier, Gene Cooperman, Robert J. Harrison
2012 2012 IEEE International Conference on Cluster Computing  
Most MADNESS applications use operators that involve many small tensor computations, resulting in a less regular organization of computations on GPUs.  ...  A single GPU kernel may have to multiply by hundreds of small square matrices (with fixed dimension ranging from 10 to 28).  ...  C.  ... 
doi:10.1109/cluster.2012.42 dblp:conf/cluster/SlaviciVCH12 fatcat:ibnwwgwor5djhmvg226qpskggy

Implications of Integrated CPU-GPU Processors on Thermal and Power Management Techniques [article]

Kapil Dev, Indrani Paul, Wei Huang, Yasuko Eckert, Wayne Burleson, and Sherief Reda
2018 arXiv   pre-print
The findings presented in the paper can be used to improve thermal and power efficiency of heterogeneous CPU-GPU processors.  ...  Heterogeneous processors with architecturally different cores (CPU and GPU) integrated on the same die lead to new challenges and opportunities for thermal and power management techniques because of shared  ...  Further, from the Figure 1 , we observe that the application-based scheduling on a CPU-GPU processor leads to larger range of the peak temperature (84.2 • C and 67.8 • C) and total runtime (130 s and  ... 
arXiv:1808.09651v1 fatcat:jj3gg5ndxbgrxgql36bp4aoofe

Efficient Probabilistic and Geometric Anatomical Mapping Using Particle Mesh Approximation on GPUs

Linh Ha, Marcel Prastawa, Guido Gerig, John H. Gilmore, Cláudio T. Silva, Sarang Joshi
2011 International Journal of Biomedical Imaging  
We also achieve the speedup of three orders of magnitudes compared to a CPU reference implementation, making it possible to use the technique in time-critical applications.  ...  and an implementation of the algorithm using particle mesh approximation on Graphical Processing Units (GPUs) to fulfill the computational requirements.  ...  (iii) Apply a parallel segmented prefix sum scan [40] to integrate all node values. All of these steps are implemented efficiently in parallel on the GPU.  ... 
doi:10.1155/2011/572187 pmid:21941523 pmcid:PMC3166611 fatcat:a4ciuqeq3za4bpmfaibdjxgrfi

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System [chapter]

Sanjay Chatterjee, Max Grossman, Alina Sbîrlea, Vivek Sarkar
2013 Lecture Notes in Computer Science  
We introduce a finish-async style API to GPU device programming with the aim of executing irregular applications efficiently across multiple shared multiprocessors (SM) in a GPU device without sacrificing  ...  So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason.  ...  Experiments To test the performance of our GPU work stealing runtime, we tried to find examples of applications that are challenging to implement efficiently on graphics hardware and data parallel applications  ... 
doi:10.1007/978-3-642-36036-7_14 fatcat:ahl3s7fuqfej3bg7vov66hybsm
« Previous Showing results 1 — 15 out of 6,830 results