A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Exploring the Heterogeneous Design Space for both Performance and Reliability
2014
Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14
We describe the design of a framework that supports a range of heterogeneous devices to be evaluated based on different performance/reliability criteria. ...
As we move into a new era of heterogeneous multi-core systems, our ability to tune the performance and understand the reliability of both hardware and software becomes more challenging. ...
The OpenCL application on the CPU interacts with the GPU though the AXI bus. This allows for detailed analysis of the CPU/GPU traffic, as well as analyzing the GPU induced traffic. ...
doi:10.1145/2593069.2596680
dblp:conf/dac/UbalSMGUCSK14
fatcat:i4b2nezub5abbgmmtz32v7mkry
Analyzing memory management methods on integrated CPU-GPU systems
2017
SIGPLAN notices
In this study, we analyze some of the common memory management methods of the most widely used software frameworks for heterogeneous systems: CUDA, OpenCL 1.2, OpenCL 2.0, and HSA, on NVIDIA and AMD hardware ...
Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. ...
Related Work Prior to our work, Hestness, Keckler and Wood [9] analyzed the CPU and GPU memory system behavior on a simulator. ...
doi:10.1145/3156685.3092256
fatcat:2klkumxvuredza4v4wtln53hg4
Analyzing memory management methods on integrated CPU-GPU systems
2017
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management - ISMM 2017
In this study, we analyze some of the common memory management methods of the most widely used software frameworks for heterogeneous systems: CUDA, OpenCL 1.2, OpenCL 2.0, and HSA, on NVIDIA and AMD hardware ...
Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. ...
Related Work Prior to our work, Hestness, Keckler and Wood [9] analyzed the CPU and GPU memory system behavior on a simulator. ...
doi:10.1145/3092255.3092256
dblp:conf/iwmm/DashtiF17
fatcat:cfyj6uqohzcwvcxou2c7ukhbve
Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU
2014
2014 IEEE 28th International Parallel and Distributed Processing Symposium
In this paper, we analyze performance and energy advantages of embedded GPUs for HPC. In particular, we analyze ARM Mali-T604 GPU -the first embedded GPUs with OpenCL Full Profile support. ...
Secondly, embedded GPUs did not provide support for parallel programming languages such as OpenCL and CUDA. ...
Aware of this upcoming trend, we analyze the use of the ARM Mali GPU Compute Architecture for HPC workloads. ...
doi:10.1109/ipdps.2014.24
dblp:conf/ipps/GrassoRRGR14
fatcat:cazacs6k7ndshkxjrdv7haqeqq
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
2015
2015 IEEE International Conference on Cluster Computing
OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. ...
Our case studies include the SNU-NPB OpenCL benchmark suite and a real-world seismology simulation. ...
We thank Kaixi Hou for his efforts in porting the seismology simulation code to OpenCL. ...
doi:10.1109/cluster.2015.15
dblp:conf/cluster/AjiPBF15
fatcat:cyr4h6zujvhcdey76qcthq5xry
NUPAR
2015
Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering - ICPE '15
Heterogeneous systems consisting of multi-core CPUs, Graphics Processing Units (GPUs) and many-core accelerators have gained widespread use by application developers and data-center platform developers ...
In this paper, we focus our discussion on applications developed in CUDA and OpenCL, and focus on high-end server class GPUs. ...
Acknowledgments This work was supported in part by a NSF CISE award CSR-1319501. We would like to thank the HSA foundation for their gift to support this work. ...
doi:10.1145/2668930.2688046
dblp:conf/wosp/UkidavePYKMCMDMK15
fatcat:y23aq6dyk5d6phhzw3kaifnhli
In this paper, we propose SnuCL, an OpenCL framework for heterogeneous CPU/GPU clusters. ...
SnuCL provides a system image running a single operating system instance for heterogeneous CPU/GPU clusters to the user. ...
In this paper, we propose an OpenCL framework called SnuCL and show that OpenCL can be a unified programming model for heterogeneous CPU/GPU clusters. ...
doi:10.1145/2304576.2304623
dblp:conf/ics/KimSLNJL12
fatcat:27qtdiidyrbeffcwtj7edlpeou
Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms
2015
2015 44th International Conference on Parallel Processing
Heterogeneous platforms are mixes of different processing units. The key factor to their efficient usage is workload partitioning. ...
In this paper, we propose an applicationdriven method to select the best partitioning strategy for a given workload. ...
The CPU implementation is the sequential implementation, and the GPU implementation is the kernel in OpenCL or CUDA (we use OpenCL in this work). ...
doi:10.1109/icpp.2015.65
dblp:conf/icpp/ShenVMS15
fatcat:pakk7nz6eneipknsws4bi3nm2a
MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL
2016
Parallel Computing
Our case studies include the SNU benchmark suite and a real-world seismology simulation. ...
The OpenCL specification tightly binds a command queue to a specific device. ...
We thank Kaixi Hou for his efforts in porting the seismology simulation code to OpenCL. ...
doi:10.1016/j.parco.2016.05.006
fatcat:2q4ri3l36vgzfocevg6psnlqcq
Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios
2019
2019 International Conference on High Performance Computing & Simulation (HPCS)
Due to the heterogeneity, some efforts have been done to reduce the programming effort and preserve performance portability, but these systems include a set of challenges. ...
The ubiquity of these architectures in both desktop systems and medium-sized service servers allow enough variability to exploit a wide range of problems, such as multimedia workloads, video encoding, ...
Desktop computers usually have an integrated heterogeneous systems, composed of CPU cores, together with GPU compute units in a single chip. Along with them, it is common to find discrete GPUs. ...
doi:10.1109/hpcs48598.2019.9188188
dblp:conf/ieeehpcs/NozalBB19
fatcat:xg733fxv3rdflmn6hzy7htgkeu
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
2013
Proceedings of the VLDB Endowment
Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. ...
CPU-GPU co-processing, respectively. ...
This work is partly supported by a MoE AcRF Tier ...
doi:10.14778/2536206.2536216
fatcat:3w2kzqxjqfh2zoohdaisn2hd3q
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
2013
ACM Transactions on Architecture and Code Optimization (TACO)
We optimize a visual object detection application (that uses Vision Video Library kernels) and show that OpenCL is a unified programming paradigm that can provide high performance when running on the Ivy ...
Bridge heterogeneous on-chip architecture. ...
Section 3 evaluates and analyzes different optimizations for our kernels using OpenCL for the CPU and the GPU. ...
doi:10.1145/2541228.2555302
fatcat:yhmbb4mnkfc2bd5jc5ojhzoizm
Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture
[article]
2013
arXiv
pre-print
Recently, coupled CPU-GPU architectures have received a lot of attention, e.g. AMD APUs with the CPU and the GPU integrated into a single chip. ...
CPU-GPU co-processing, respectively. ...
This work is partly supported by a MoE AcRF Tier ...
arXiv:1307.1955v1
fatcat:ern2gy5rbvfttcwovybd3owy7e
An experimental study of group-by and aggregation on CPU-GPU processors
2022
Journal of Engineering and Applied Science (Cairo) (Online)
We conduct an extensive experimental study and analysis on the single CPU, the coupled GPU, and both processors. ...
Hash-based group-by and aggregation is a fundamental operator in database systems. Modern discrete GPUs (graphics processing units) have been considered to accelerate the performance. ...
In order to build the hash table by all the work items of the CPU and the GPU, the methods are implemented using OpenCL 2.0 with the Shared Virtual Memory (SVM) feature and the corresponding experiments ...
doi:10.1186/s44147-022-00108-1
doaj:7bf36a88efbb47b48e7c017fb3516408
fatcat:gmkpfzv7svg7xkm4gh6czyg424
An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms
2011
2011 IEEE International Conference on High Performance Computing and Communications
It has been a significant research and personal challenge and it is one of the most important steps on my career. ...
To reach this goal, a set of personal, technical, and financial support were needed, which without any of them I could not have developed this work. ...
In the GPU, using OpenCL or CUDA, a context is analogous to a CPU process. ...
doi:10.1109/hpcc.2011.20
dblp:conf/hpcc/BinottoPKSF11
fatcat:bjdij42z5fe7dmfykjjj3n7p74
« Previous
Showing results 1 — 15 out of 260 results