Filters








260 Hits in 7.2 sec

Exploring the Heterogeneous Design Space for both Performance and Reliability

Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, David Kaeli
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
We describe the design of a framework that supports a range of heterogeneous devices to be evaluated based on different performance/reliability criteria.  ...  As we move into a new era of heterogeneous multi-core systems, our ability to tune the performance and understand the reliability of both hardware and software becomes more challenging.  ...  The OpenCL application on the CPU interacts with the GPU though the AXI bus. This allows for detailed analysis of the CPU/GPU traffic, as well as analyzing the GPU induced traffic.  ... 
doi:10.1145/2593069.2596680 dblp:conf/dac/UbalSMGUCSK14 fatcat:i4b2nezub5abbgmmtz32v7mkry

Analyzing memory management methods on integrated CPU-GPU systems

Mohammad Dashti, Alexandra Fedorova
2017 SIGPLAN notices  
In this study, we analyze some of the common memory management methods of the most widely used software frameworks for heterogeneous systems: CUDA, OpenCL 1.2, OpenCL 2.0, and HSA, on NVIDIA and AMD hardware  ...  Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous.  ...  Related Work Prior to our work, Hestness, Keckler and Wood [9] analyzed the CPU and GPU memory system behavior on a simulator.  ... 
doi:10.1145/3156685.3092256 fatcat:2klkumxvuredza4v4wtln53hg4

Analyzing memory management methods on integrated CPU-GPU systems

Mohammad Dashti, Alexandra Fedorova
2017 Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management - ISMM 2017  
In this study, we analyze some of the common memory management methods of the most widely used software frameworks for heterogeneous systems: CUDA, OpenCL 1.2, OpenCL 2.0, and HSA, on NVIDIA and AMD hardware  ...  Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous.  ...  Related Work Prior to our work, Hestness, Keckler and Wood [9] analyzed the CPU and GPU memory system behavior on a simulator.  ... 
doi:10.1145/3092255.3092256 dblp:conf/iwmm/DashtiF17 fatcat:cfyj6uqohzcwvcxou2c7ukhbve

Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU

Ivan Grasso, Petar Radojkovic, Nikola Rajovic, Isaac Gelado, Alex Ramirez
2014 2014 IEEE 28th International Parallel and Distributed Processing Symposium  
In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU -the first embedded GPUs with OpenCL Full Profile support.  ...  Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA.  ...  Aware of this upcoming trend, we analyze the use of the ARM Mali GPU Compute Architecture for HPC workloads.  ... 
doi:10.1109/ipdps.2014.24 dblp:conf/ipps/GrassoRRGR14 fatcat:cazacs6k7ndshkxjrdv7haqeqq

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Ashwin Mandayam Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng
2015 2015 IEEE International Conference on Cluster Computing  
OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices.  ...  Our case studies include the SNU-NPB OpenCL benchmark suite and a real-world seismology simulation.  ...  We thank Kaixi Hou for his efforts in porting the seismology simulation code to OpenCL.  ... 
doi:10.1109/cluster.2015.15 dblp:conf/cluster/AjiPBF15 fatcat:cyr4h6zujvhcdey76qcthq5xry

NUPAR

Yash Ukidave, David Kaeli, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry
2015 Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering - ICPE '15  
Heterogeneous systems consisting of multi-core CPUs, Graphics Processing Units (GPUs) and many-core accelerators have gained widespread use by application developers and data-center platform developers  ...  In this paper, we focus our discussion on applications developed in CUDA and OpenCL, and focus on high-end server class GPUs.  ...  Acknowledgments This work was supported in part by a NSF CISE award CSR-1319501. We would like to thank the HSA foundation for their gift to support this work.  ... 
doi:10.1145/2668930.2688046 dblp:conf/wosp/UkidavePYKMCMDMK15 fatcat:y23aq6dyk5d6phhzw3kaifnhli

SnuCL

Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee
2012 Proceedings of the 26th ACM international conference on Supercomputing - ICS '12  
In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters.  ...  SnuCL provides a system image running a single operating system instance for heterogeneous CPU/GPU clusters to the user.  ...  In this paper, we propose an OpenCL framework called SnuCL and show that OpenCL can be a unified programming model for heterogeneous CPU/GPU clusters.  ... 
doi:10.1145/2304576.2304623 dblp:conf/ics/KimSLNJL12 fatcat:27qtdiidyrbeffcwtj7edlpeou

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms

Jie Shen, Ana Lucia Varbanescu, Xavier Martorell, Henk Sips
2015 2015 44th International Conference on Parallel Processing  
Heterogeneous platforms are mixes of different processing units. The key factor to their efficient usage is workload partitioning.  ...  In this paper, we propose an applicationdriven method to select the best partitioning strategy for a given workload.  ...  The CPU implementation is the sequential implementation, and the GPU implementation is the kernel in OpenCL or CUDA (we use OpenCL in this work).  ... 
doi:10.1109/icpp.2015.65 dblp:conf/icpp/ShenVMS15 fatcat:pakk7nz6eneipknsws4bi3nm2a

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL

Ashwin M. Aji, Antonio J. Peña, Pavan Balaji, Wu-chun Feng
2016 Parallel Computing  
Our case studies include the SNU benchmark suite and a real-world seismology simulation.  ...  The OpenCL specification tightly binds a command queue to a specific device.  ...  We thank Kaixi Hou for his efforts in porting the seismology simulation code to OpenCL.  ... 
doi:10.1016/j.parco.2016.05.006 fatcat:2q4ri3l36vgzfocevg6psnlqcq

Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios

Raul Nozal, Jose Luis Bosque, Ramon Beivide
2019 2019 International Conference on High Performance Computing & Simulation (HPCS)  
Due to the heterogeneity, some efforts have been done to reduce the programming effort and preserve performance portability, but these systems include a set of challenges.  ...  The ubiquity of these architectures in both desktop systems and medium-sized service servers allow enough variability to exploit a wide range of problems, such as multimedia workloads, video encoding,  ...  Desktop computers usually have an integrated heterogeneous systems, composed of CPU cores, together with GPU compute units in a single chip. Along with them, it is common to find discrete GPUs.  ... 
doi:10.1109/hpcs48598.2019.9188188 dblp:conf/ieeehpcs/NozalBB19 fatcat:xg733fxv3rdflmn6hzy7htgkeu

Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Jiong He, Mian Lu, Bingsheng He
2013 Proceedings of the VLDB Endowment  
Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip.  ...  CPU-GPU co-processing, respectively.  ...  This work is partly supported by a MoE AcRF Tier  ... 
doi:10.14778/2536206.2536216 fatcat:3w2kzqxjqfh2zoohdaisn2hd3q

Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures

Ehsan Totoni, Mert Dikmen, María Jesús Garzarán
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
We optimize a visual object detection application (that uses Vision Video Library kernels) and show that OpenCL is a unified programming paradigm that can provide high performance when running on the Ivy  ...  Bridge heterogeneous on-chip architecture.  ...  Section 3 evaluates and analyzes different optimizations for our kernels using OpenCL for the CPU and the GPU.  ... 
doi:10.1145/2541228.2555302 fatcat:yhmbb4mnkfc2bd5jc5ojhzoizm

Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture [article]

Jiong He, Mian Lu, Bingsheng He
2013 arXiv   pre-print
Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip.  ...  CPU-GPU co-processing, respectively.  ...  This work is partly supported by a MoE AcRF Tier  ... 
arXiv:1307.1955v1 fatcat:ern2gy5rbvfttcwovybd3owy7e

An experimental study of group-by and aggregation on CPU-GPU processors

Hua Luan, Lei Chang
2022 Journal of Engineering and Applied Science (Cairo) (Online)  
We conduct an extensive experimental study and analysis on the single CPU, the coupled GPU, and both processors.  ...  Hash-based group-by and aggregation is a fundamental operator in database systems. Modern discrete GPUs (graphics processing units) have been considered to accelerate the performance.  ...  In order to build the hash table by all the work items of the CPU and the GPU, the methods are implemented using OpenCL 2.0 with the Shared Virtual Memory (SVM) feature and the corresponding experiments  ... 
doi:10.1186/s44147-022-00108-1 doaj:7bf36a88efbb47b48e7c017fb3516408 fatcat:gmkpfzv7svg7xkm4gh6czyg424

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms

Alecio P. D. Binotto, Carlos E. Pereira, Arjan Kuijper, Andre Stork, Dieter W. Fellner
2011 2011 IEEE International Conference on High Performance Computing and Communications  
It has been a significant research and personal challenge and it is one of the most important steps on my career.  ...  To reach this goal, a set of personal, technical, and financial support were needed, which without any of them I could not have developed this work.  ...  In the GPU, using OpenCL or CUDA, a context is analogous to a CPU process.  ... 
doi:10.1109/hpcc.2011.20 dblp:conf/hpcc/BinottoPKSF11 fatcat:bjdij42z5fe7dmfykjjj3n7p74
« Previous Showing results 1 — 15 out of 260 results