Filters








1,881 Hits in 3.5 sec

Communication Library to Overlap Computation and Communication for OpenCL Application

Toshiya Komoda, Shinobu Miwa, Hiroshi Nakamura
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum  
Thus, it is necessary to develop a communication system which is highly abstracted but still capable of optimization. For this purpose, this paper proposes an OpenCL based communication library.  ...  We have implemented a prototype system on OpenCL platform and applied it to several image processing applications.  ...  ACKNOWLEDGMENT This work was supported by the Grant-in-Aid for JSPS Fellows (23 8062).  ... 
doi:10.1109/ipdpsw.2012.68 dblp:conf/ipps/KomodaMN12 fatcat:6zqtojdjmnhnlnoooaemvomsr4

clMAGMA

Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov
2014 Proceedings of the International Workshop on OpenCL 2013 & 2014 - IWOCL '14  
Further, we give an overview of the clMAGMA library, an open source, high performance OpenCL library that incorporates the developments presented, and in general provides to heterogeneous architectures  ...  High performance is obtained through use of the high-performance OpenCL BLAS, hardware and OpenCL-specific tuning, and a hybridization methodology where we split the algorithm into computational tasks  ...  Acknowledgments The authors would like to thank the National Science Foundation (award #0910735), the Department of Energy, and AMD for supporting this research effort.  ... 
doi:10.1145/2664666.2664667 dblp:conf/iwocl/CaoDDGLT14 fatcat:ghu4z4pjgvhmzm7u25pxsjywky

Optimized Data Transfers Based on the OpenCL Event Management Mechanism

Hiroyuki Takizawa, Shoichi Hirasawa, Makoto Sugawara, Isaac Gelado, Hiroaki Kobayashi, Wen-mei W. Hwu
2015 Scientific Programming  
Hence, an application can easily use the opportunities to overlap parallel activities of hosts and compute devices.  ...  Since compute devices are dedicated to kernel computation, only hosts can execute several kinds of data transfers such as internode communication and file access.  ...  The authors would also like to thank the RIKEN Integrated Cluster of Clusters (RICC) at RIKEN for the user supports and the computer resources used for the performance evaluation.  ... 
doi:10.1155/2015/576498 fatcat:ufogpvi2hbfyfmvwu6uwwz46r4

A uniform approach for programming distributed heterogeneous computing systems

Ivan Grasso, Simone Pellegrini, Biagio Cosenza, Thomas Fahringer
2014 Journal of Parallel and Distributed Computing  
In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple  ...  The Open Computing Language ) is a partial solution to the problem.  ...  We would also like to thank the Barcelona Supercomputing Center for the availability of the MinoTauro GPU cluster.  ... 
doi:10.1016/j.jpdc.2014.08.002 pmid:25844015 pmcid:PMC4375632 fatcat:cisjmepk6vc2nervqlpnxwna4y

MPI-ACC: Accelerator-Aware MPI for Scientific Applications

Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Karthik Murthy, Milind Chabbi, Pavan Balaji, Keith R. Bisset, James Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma, Rajeev Thakur
2016 IEEE Transactions on Parallel and Distributed Systems  
We describe how MPI-ACC can be used to design new communication-computation patterns in scientific applications from domains such as epidemiology simulation and seismology modeling, and we discuss the  ...  MPI-ACC: ACCELERATOR-AWARE MPI FOR SCIENTIFIC APPLICATIONS 2 followed by MPI communication between the host CPUs (Figures 1a and 1b) .  ...  Thus, the application has to move large wavefield data between the CPU and the GPU for data marshaling and MPI communication after every stress and velocity computation phase over every iteration.  ... 
doi:10.1109/tpds.2015.2446479 fatcat:rhrwdmnoobeqldmuhhwu2wmelm

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices.  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  In this example, we can exploit temporal sharing to overlap the host-device communication and computation stages to achieve better runtime when compared to execute every stage sequentially.  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae

Extending OpenSHMEM for GPU Computing

S. Potluri, D. Bureddy, H. Wang, H. Subramoni, D.K. Panda
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
Can the extensions be interoperable with both CUDA and OpenCL for wider acceptance in the GPU computing community?  ...  one-sided communication -Flexible synchronization -Lower synchronization and communication overheads -fit the requirements for GPU computing ?  ... 
doi:10.1109/ipdps.2013.104 dblp:conf/ipps/PotluriBWSP13 fatcat:vrtjamksdvd7fa2c7lu4j4syju

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 CCF Transactions on High Performance Computing  
We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices.  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  In this example, we can exploit temporal sharing to overlap the host-device communication and computation stages to achieve better runtime when compared to execute every stage sequentially.  ... 
doi:10.1007/s42514-020-00039-4 fatcat:nn56xhjm6rcu7kya6gfnyjg66q

Algorithmic Skeleton Framework for the Orchestration of GPU Computations [chapter]

Ricardo Marques, Hervé Paulino, Fernando Alexandre, Pedro D. Medeiros
2013 Lecture Notes in Computer Science  
Table 3.2: Execution pattern of OpenCL and the proposed skeletons The previous simple OpenCL application pattern does not introduce overlap between communication and computation.  ...  known as overlap between communication and computation.  ... 
doi:10.1007/978-3-642-40047-6_86 fatcat:rsrjtgynnzfedjmtejym3khija

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

Shucai Xiao, Wu-chun Feng
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum  
When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes.  ...  As such, programming GPUs in large-scale computing systems is more challenging than local GPUs since local and remote GPUs have to be dealt with separately.  ...  The VOCL library exposes the OpenCL API to applications and is responsible for sending information about OpenCL calls to the VOCL proxy using MPI, and returning the proxy responses to the application.  ... 
doi:10.1109/ipdpsw.2012.325 dblp:conf/ipps/XiaoF12 fatcat:7stovln5pnfgzerg6xfb2fjwgi

Using the SkelCL Library for High-Level GPU Programming of 2D Applications [chapter]

Michel Steuwer, Sergei Gorlatch, Matthias Buß, Stefan Breuer
2013 Lecture Notes in Computer Science  
The SkelCL library offers pre-implemented recurring computation and communication patterns (skeletons) which greatly simplify programming for single-and multi-GPU systems.  ...  Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches -CUDA and OpenCL -are intrinsically low-level and offer no special support for systems  ...  In this paper, we first briefly describe our SkelCL library [9] for high-level single-and multi-GPU computing.  ... 
doi:10.1007/978-3-642-36949-0_41 fatcat:ynsy7ttzzfhbxiop2tj2kbye5i

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems

Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset, Rajeev Thakur
2012 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems  
This query is expensive relative to extremely low-latency communication times and can add significant overhead to hostto-host communication operations.  ...  We examine the impact of MPI-ACC on communication performance and evaluate application-level benefits on a large-scale epidemiology simulation.  ...  GPU Programming Models: CUDA and OpenCL CUDA [3] and OpenCL [4] are the most commonly used parallel programming models for GPU computing.  ... 
doi:10.1109/hpcc.2012.92 dblp:conf/hpcc/AjiDBBFBT12 fatcat:eneekatrqzepdixjhq2hzecakq

Transparent Accelerator Migration in a Virtualized GPU Environment

Shucai Xiao, Pavan Balaji, James Dinan, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, Wu-chun Feng
2012 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)  
Through transparent load balancing, our system provides a speedup of 1.7 to 1.9 for three of the four application kernels.  ...  Techniques to increase responsiveness and reduce migration overhead are explored. The system is evaluated by using four application kernels and is demonstrated to provide low migration overheads.  ...  When a VOCL handle is used, the library first translates it to the corresponding OpenCL handle and then generates the corresponding information for data communication (if there is any).  ... 
doi:10.1109/ccgrid.2012.26 dblp:conf/ccgrid/XiaoBDZTCLWHF12 fatcat:gqa34oaxa5htph4jbz7hsymnka

SkelCL: a high-level extension of OpenCL for multi-GPU systems

Michel Steuwer, Sergei Gorlatch
2014 Journal of Supercomputing  
We present SkelCL -a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL.  ...  Application development for modern high-performance systems with Graphics Processing Units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy  ...  Acknowledgments This work is partially supported by the OFERTIE (FP7) and MONICA projects. We would like to thank NVIDIA for their generous hardware donation.  ... 
doi:10.1007/s11227-014-1213-y fatcat:pcdedtcvtvekhom26qr3mp2ldy

Heterogeneous Parallel Implementation of Large-Scale Numerical Simulation of Saint-Venant Equations

Yongmeng Qi, Qiang Li, Zhigang Zhao, Jiahua Zhang, Lingyun Gao, Wu Yuan, Zhonghua Lu, Ningming Nie, Xiaomin Shang, Shunan Tao
2022 Applied Sciences  
On this basis, we applied communication/calculation overlapping and the local memory acceleration to optimize the performance.  ...  We use the two-dimensional Saint-Venant equations as an example and for high-performance computing in modelling the flood behavior.  ...  Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Conflicts of Interest: The authors declare no conflict of interest. Appl. Sci. 2022, 12, 5671  ... 
doi:10.3390/app12115671 fatcat:4ykmjp7zmfdvzfcytmuf6whqny
« Previous Showing results 1 — 15 out of 1,881 results