A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
Optimizing energy consumption and parallel performance for static and dynamic betweenness centrality using GPUs
2014
2014 IEEE High Performance Extreme Computing Conference (HPEC)
Dynamic on the Kayla Platform (GPU)
• Times are averaged for 100 edge insertions
Graph
delaunay_n12
kron_g500-logn16
Solution Quality
Exact
Approximate ( = 256)
Static Time (s)
12.63
5.63 ...
GPU on the Kayla Platform (Dynamic)
• Times are averaged for 100 edge insertions
Graph
delaunay_n12
kron_g500-logn16
Solution Quality
Exact
Approximate ( = 256)
CPU Time (s)
35.44
33.79
GPU ...
doi:10.1109/hpec.2014.7040980
dblp:conf/hpec/McLaughlinRB14
fatcat:yblweij5ffdqrgqurafsf4qgd4
SwitchFlow
2021
Proceedings of the 22nd International Middleware Conference
Our study shows that GPU kernels, spawned from computation graphs, can barely execute simultaneously on a single GPU and time slicing may lead to low GPU utilization. ...
, which are typically represented as computation graphs, heavily optimized by underlying DL libraries, and run on a complex pipeline spanning CPU and GPU. ...
ACKNOWLEDGEMENTS We are grateful to the anonymous reviewers for their comments on this paper. This work was supported in part by U.S. NSF grants CCF-1845706 and IIS-1852606. ...
doi:10.1145/3464298.3493391
fatcat:2fsjulz7gjahfe6n2533lcpgzi
Mapping a data-flow programming model onto heterogeneous platforms
2012
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12
However, programming these heterogeneous platforms poses a serious challenge for application developers. ...
We demonstrate a working example that maps a pipeline of medical image-processing algorithms onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs. ...
Additional thanks to the Habanero team for their comments and feedback on this work. ...
doi:10.1145/2248418.2248428
dblp:conf/lctrts/SbirleaZBCS12
fatcat:pt3s2jlcibehho65hstsw65ahm
A GPU Task-Parallel Model with Dependency Resolution
2012
Computer
We present two dependency-aware scheduling schemes-static and dynamic-and analyze their behavior using a synthetic workload. ...
We present a task-parallel programming model for the GPU. Our task model is robust enough to handle irregular workloads that contain dependencies. ...
His main research interests are task-parallel scheduling, graphics, and GPU computing. He obtained his bachelor and masters at Columbia University. He can be reached via email at stzeng@ucdavis.edu. ...
doi:10.1109/mc.2012.255
fatcat:bzm74dafengdvm57bwjz4bdko4
Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale
[article]
2020
arXiv
pre-print
Our programming model empowers users with both static and dynamic task graph constructions to incorporate a broad range of computational patterns including hybrid CPU-GPU computing, dynamic control flow ...
40 CPUs and 4 GPUs. ...
A cudaFlow spawns a GPU task graph at its execution context for stateful parameter capture and offloads GPU operations to one or many GPUs. ...
arXiv:2004.10908v2
fatcat:snwlszx6bnhnflbpmddx5ileyi
Stochastic gradient descent on GPUs
2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. ...
Furthermore, scheduling for scale-free graphs is challenging. This work examines several synchronization strategies for SGD, ranging from simple locking to conflict-free scheduling. ...
We believe these scheduling insights will be useful when implementing other algorithms on scale-free graphs on the GPU. ...
doi:10.1145/2716282.2716289
dblp:conf/ppopp/KaleemPP15
fatcat:qn3m5iy7uzbppetekzkbgkenxi
BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
[article]
2021
arXiv
pre-print
Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs. ...
Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20.68x on average. ...
It places graph structure data on each GPU with static caching on node features, leading to much faster data preprocessing. ...
arXiv:2112.08541v1
fatcat:kzel63n3ircqdpcuie2d4jd7y4
Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs
2022
Electronics
In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. ...
The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. ...
From this point on, we focus on the comparison between the OpenACC implementation and the use of GPU static graphs (CUDA Graph) as part of the OpenACC specification. ...
doi:10.3390/electronics11091307
fatcat:o3ci4jvekbee3op23hutttoenu
The lightspeed automatic interactive lighting preview system
2007
ACM Transactions on Graphics
Static code analysis is challenging and tends to be conservative. ...
Wexler, et al. implemented high-quality supersampling on the GPU [30] , but they focus on final rendering, while we optimize for static visibility, resulting in a different data structure. ...
doi:10.1145/1276377.1276409
fatcat:u32slungb5hrzpja7sgjf5ihtq
Astra also significantly outperforms static compilation frameworks such as Tensorflow XLA both in performance and robustness. ...
Instead of treating the computation as a generic data flow graph, Astra exploits domain knowledge about deep learning to adopt a custom approach to compiler optimization. ...
GPU VMs. ...
doi:10.1145/3297858.3304072
dblp:conf/asplos/SivathanuCSZ19
fatcat:mfngxrztxngvbmvmntivgrqmde
An overview of Medusa
2012
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12
To address those difficulties, we propose a programming framework named Medusa to simplify graph processing on GPUs. ...
The Medusa runtime system automatically executes the developer defined APIs in parallel on the GPU, with a series of graph-centric optimizations. ...
Medusa programs written for one GPU can run on multiple GPUs without modification. We deliver Medusa with static libraries and source code templates. ...
doi:10.1145/2145816.2145855
dblp:conf/ppopp/ZhongH12
fatcat:4vgoghgoarhdhac4n3tpj7yuz4
An overview of Medusa
2012
SIGPLAN notices
To address those difficulties, we propose a programming framework named Medusa to simplify graph processing on GPUs. ...
The Medusa runtime system automatically executes the developer defined APIs in parallel on the GPU, with a series of graph-centric optimizations. ...
Medusa programs written for one GPU can run on multiple GPUs without modification. We deliver Medusa with static libraries and source code templates. ...
doi:10.1145/2370036.2145855
fatcat:zbufnoctojekhlszpx7op23q7q
Special issue on programming models and applications for multicores and manycores 2019–2020
2021
Concurrency and Computation
This special issue focuses on all aspects of parallel programming on multicore and manycore architectures, such as programming models and systems, applications and algorithms, performance analysis, and ...
It includes selected articles from the 2019 and 2020 editions of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2019 and PMAM 2020). ...
editors would like to thank all the reviewers and the Concurrency and Computation Practice and Experience Wiley office staff who worked hard to make this high-quality journal issue happen, while facing challenges ...
doi:10.1002/cpe.6677
fatcat:nxunsbdzargllcbylm35si5jvm
A Survey of Multi-Tenant Deep Learning Inference on GPU
[article]
2022
arXiv
pre-print
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. ...
However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. ...
Graph and Runtime-level Scheduling Graph and runtime-level scheduling could help address one of the aforementioned challenge of coarse-grained granularity by enabling more fine-grained scheduling, e.g. ...
arXiv:2203.09040v3
fatcat:utvpoyvvajfhfghgpf45nxnbne
Communication-aware mapping of stream graphs for multi-GPU platforms
2016
Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016
poses great challenges for stream graph mapping. ...
Our experimental results on a real GPU platform demonstrate that our techniques can generate significantly better performance than the current state of the art, in both single GPU and multi-GPU cases. ...
Another challenge with multi-GPU mapping is graph partitioning. Finding the right set of partitions is important, since it makes a permanent impact on the ensuing mapping step. ...
doi:10.1145/2854038.2854055
dblp:conf/cgo/NguyenL16
fatcat:hxx543ni4jf5dajfk5qjpleavm
« Previous
Showing results 1 — 15 out of 14,798 results