Filters








14,798 Hits in 2.9 sec

Optimizing energy consumption and parallel performance for static and dynamic betweenness centrality using GPUs

Adam McLaughlin, Jason Riedy, David A. Bader
2014 2014 IEEE High Performance Extreme Computing Conference (HPEC)  
Dynamic on the Kayla Platform (GPU) • Times are averaged for 100 edge insertions Graph delaunay_n12 kron_g500-logn16 Solution Quality Exact Approximate ( = 256) Static Time (s) 12.63 5.63  ...  GPU on the Kayla Platform (Dynamic) • Times are averaged for 100 edge insertions Graph delaunay_n12 kron_g500-logn16 Solution Quality Exact Approximate ( = 256) CPU Time (s) 35.44 33.79 GPU  ... 
doi:10.1109/hpec.2014.7040980 dblp:conf/hpec/McLaughlinRB14 fatcat:yblweij5ffdqrgqurafsf4qgd4

SwitchFlow

Xiaofeng Wu, Jia Rao, Wei Chen, Hang Huang, Chris Ding, Heng Huang
2021 Proceedings of the 22nd International Middleware Conference  
Our study shows that GPU kernels, spawned from computation graphs, can barely execute simultaneously on a single GPU and time slicing may lead to low GPU utilization.  ...  , which are typically represented as computation graphs, heavily optimized by underlying DL libraries, and run on a complex pipeline spanning CPU and GPU.  ...  ACKNOWLEDGEMENTS We are grateful to the anonymous reviewers for their comments on this paper. This work was supported in part by U.S. NSF grants CCF-1845706 and IIS-1852606.  ... 
doi:10.1145/3464298.3493391 fatcat:2fsjulz7gjahfe6n2533lcpgzi

Mapping a data-flow programming model onto heterogeneous platforms

Alina Sbîrlea, Yi Zou, Zoran Budimlíc, Jason Cong, Vivek Sarkar
2012 Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems - LCTES '12  
However, programming these heterogeneous platforms poses a serious challenge for application developers.  ...  We demonstrate a working example that maps a pipeline of medical image-processing algorithms onto a prototype heterogeneous platform that includes CPUs, GPUs and FPGAs.  ...  Additional thanks to the Habanero team for their comments and feedback on this work.  ... 
doi:10.1145/2248418.2248428 dblp:conf/lctrts/SbirleaZBCS12 fatcat:pt3s2jlcibehho65hstsw65ahm

A GPU Task-Parallel Model with Dependency Resolution

Stanley Tzeng, Brandon Lloyd, John D. Owens
2012 Computer  
We present two dependency-aware scheduling schemes-static and dynamic-and analyze their behavior using a synthetic workload.  ...  We present a task-parallel programming model for the GPU. Our task model is robust enough to handle irregular workloads that contain dependencies.  ...  His main research interests are task-parallel scheduling, graphics, and GPU computing. He obtained his bachelor and masters at Columbia University. He can be reached via email at stzeng@ucdavis.edu.  ... 
doi:10.1109/mc.2012.255 fatcat:bzm74dafengdvm57bwjz4bdko4

Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale [article]

Tsung-Wei Huang, Dian-Lun Lin, Yibo Lin, Chun-Xun Lin
2020 arXiv   pre-print
Our programming model empowers users with both static and dynamic task graph constructions to incorporate a broad range of computational patterns including hybrid CPU-GPU computing, dynamic control flow  ...  40 CPUs and 4 GPUs.  ...  A cudaFlow spawns a GPU task graph at its execution context for stateful parameter capture and offloads GPU operations to one or many GPUs.  ... 
arXiv:2004.10908v2 fatcat:snwlszx6bnhnflbpmddx5ileyi

Stochastic gradient descent on GPUs

Rashid Kaleem, Sreepathi Pai, Keshav Pingali
2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015  
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs.  ...  Furthermore, scheduling for scale-free graphs is challenging. This work examines several synchronization strategies for SGD, ranging from simple locking to conflict-free scheduling.  ...  We believe these scheduling insights will be useful when implementing other algorithms on scale-free graphs on the GPU.  ... 
doi:10.1145/2716282.2716289 dblp:conf/ppopp/KaleemPP15 fatcat:qn3m5iy7uzbppetekzkbgkenxi

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing [article]

Tianfeng Liu
2021 arXiv   pre-print
Nonetheless, existing systems are inefficient to train large graphs with billions of nodes and edges with GPUs.  ...  Extensive experiments on various GNN models and large graph datasets show that BGL significantly outperforms existing GNN training systems by 20.68x on average.  ...  It places graph structure data on each GPU with static caching on node features, leading to much faster data preprocessing.  ... 
arXiv:2112.08541v1 fatcat:kzel63n3ircqdpcuie2d4jd7y4

Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs

Leonel Toledo, Pedro Valero-Lara, Jeffrey S. Vetter, Antonio J. Peña
2022 Electronics  
In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA.  ...  The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs.  ...  From this point on, we focus on the comparison between the OpenACC implementation and the use of GPU static graphs (CUDA Graph) as part of the OpenACC specification.  ... 
doi:10.3390/electronics11091307 fatcat:o3ci4jvekbee3op23hutttoenu

The lightspeed automatic interactive lighting preview system

Jonathan Ragan-Kelley, Charlie Kilpatrick, Brian W. Smith, Doug Epps, Paul Green, Christophe Hery, Frédo Durand
2007 ACM Transactions on Graphics  
Static code analysis is challenging and tends to be conservative.  ...  Wexler, et al. implemented high-quality supersampling on the GPU [30] , but they focus on final rendering, while we optimize for static visibility, resulting in a different data structure.  ... 
doi:10.1145/1276377.1276409 fatcat:u32slungb5hrzpja7sgjf5ihtq

Astra

Muthian Sivathanu, Tapan Chugh, Sanjay S. Singapuram, Lidong Zhou
2019 Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '19  
Astra also significantly outperforms static compilation frameworks such as Tensorflow XLA both in performance and robustness.  ...  Instead of treating the computation as a generic data flow graph, Astra exploits domain knowledge about deep learning to adopt a custom approach to compiler optimization.  ...  GPU VMs.  ... 
doi:10.1145/3297858.3304072 dblp:conf/asplos/SivathanuCSZ19 fatcat:mfngxrztxngvbmvmntivgrqmde

An overview of Medusa

Jianlong Zhong, Bingsheng He
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
To address those difficulties, we propose a programming framework named Medusa to simplify graph processing on GPUs.  ...  The Medusa runtime system automatically executes the developer defined APIs in parallel on the GPU, with a series of graph-centric optimizations.  ...  Medusa programs written for one GPU can run on multiple GPUs without modification. We deliver Medusa with static libraries and source code templates.  ... 
doi:10.1145/2145816.2145855 dblp:conf/ppopp/ZhongH12 fatcat:4vgoghgoarhdhac4n3tpj7yuz4

An overview of Medusa

Jianlong Zhong, Bingsheng He
2012 SIGPLAN notices  
To address those difficulties, we propose a programming framework named Medusa to simplify graph processing on GPUs.  ...  The Medusa runtime system automatically executes the developer defined APIs in parallel on the GPU, with a series of graph-centric optimizations.  ...  Medusa programs written for one GPU can run on multiple GPUs without modification. We deliver Medusa with static libraries and source code templates.  ... 
doi:10.1145/2370036.2145855 fatcat:zbufnoctojekhlszpx7op23q7q

Special issue on programming models and applications for multicores and manycores 2019–2020

Min Si, Quan Chen, Zhiyi Huang
2021 Concurrency and Computation  
This special issue focuses on all aspects of parallel programming on multicore and manycore architectures, such as programming models and systems, applications and algorithms, performance analysis, and  ...  It includes selected articles from the 2019 and 2020 editions of the International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2019 and PMAM 2020).  ...  editors would like to thank all the reviewers and the Concurrency and Computation Practice and Experience Wiley office staff who worked hard to make this high-quality journal issue happen, while facing challenges  ... 
doi:10.1002/cpe.6677 fatcat:nxunsbdzargllcbylm35si5jvm

A Survey of Multi-Tenant Deep Learning Inference on GPU [article]

Fuxun Yu, Di Wang, Longfei Shangguan, Minjia Zhang, Chenchen Liu, Xiang Chen
2022 arXiv   pre-print
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU.  ...  However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization.  ...  Graph and Runtime-level Scheduling Graph and runtime-level scheduling could help address one of the aforementioned challenge of coarse-grained granularity by enabling more fine-grained scheduling, e.g.  ... 
arXiv:2203.09040v3 fatcat:utvpoyvvajfhfghgpf45nxnbne

Communication-aware mapping of stream graphs for multi-GPU platforms

Dong Nguyen, Jongeun Lee
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
poses great challenges for stream graph mapping.  ...  Our experimental results on a real GPU platform demonstrate that our techniques can generate significantly better performance than the current state of the art, in both single GPU and multi-GPU cases.  ...  Another challenge with multi-GPU mapping is graph partitioning. Finding the right set of partitions is important, since it makes a permanent impact on the ensuing mapping step.  ... 
doi:10.1145/2854038.2854055 dblp:conf/cgo/NguyenL16 fatcat:hxx543ni4jf5dajfk5qjpleavm
« Previous Showing results 1 — 15 out of 14,798 results