151,191 Hits in 6.4 sec

Overlapping Communication and Computation with High Level Communication Routines

Torsten Hoefler, Andrew Lumsdaine
2008 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)  
A real-world quantum-mechanical application is used as a deployment and evaluation vehicle for our approach. 1 or lower level BLAS operations  ...  We use a well-understood network model to found our theoretical analyses and we realize our communication operations as a portable library layered on MPI.  ...  All approaches are either using overlapping techniques for point-to-point messages or optimize their codes with high-level communication routines.  ... 
doi:10.1109/ccgrid.2008.15 dblp:conf/ccgrid/HoeflerL08 fatcat:5awb5i3w5vba5luutl4p475nkq

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers

K. Kandalla, U. Yang, J. Keasler, T. Kolev, A. Moody, H. Subramoni, K. Tomko, J. Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
Our designs scale beyond 512 processes and we achieve near perfect communication/computation overlap.  ...  Hypre is a high performance, scalable software library that offers several optimized linear solver routines and pre-conditioners.  ...  Despite the fact that Flat scheme has very high communication latency, if there is enough compute to overlap, it could potentially deliver better overlap.  ... 
doi:10.1109/ipdps.2012.106 dblp:conf/ipps/KandallaYKKMSTVSP12 fatcat:knzkqeevsjcuppyk47p4jwrj5a

Asynchronous communication in spectral-element and discontinuous Galerkin methods for atmospheric dynamics – a case study using the High-Order Methods Modeling Environment (HOMME-homme_dg_branch)

Benjamin F. Jamroz, Robert Klöfkorn
2016 Geoscientific Model Development  
This allows the overlap of computation with communication, effectively hiding some of the costs of communication.  ...  We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous  ...  We would like to acknowledge highperformance computing support from Yellowstone (Computational and Information Systems Laboratory, 2012)  ... 
doi:10.5194/gmd-9-2881-2016 fatcat:getoyf6fdra5dnsggtfcvtpvqu

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen, JiDong Zhai, Jin Zhang, WeiMin Zheng
2009 Science in China Series F Information Sciences  
and there is a maximum overlap degree between computation and communication.  ...  However, most contemporary MPI implementations are not able to provide true overlap between computation and communication even with nonblocking message passing interface.  ...  We define the overlap ratio of computation to communication as R o = comp (comp + O) . Besides, the gap (g) has little effect on the communication cost of high-level communication routines.  ... 
doi:10.1007/s11432-009-0161-2 fatcat:tzrhi7lgkzcgxeyfoinxeag32q

NEMO-Med: Optimization and Improvement of Scalability

Italo Epicoco, Silvia Mocavero, Aloisio Giovanni
2011 Social Science Research Network  
The NEMO oceanic model is widely used among the climate community. It is used with different configurations in more than 50 research projects for both long and short-term simulations.  ...  Computational requirements of the model and its implementation limit the exploitation of the emerging computational infrastructure at peta and exascale.  ...  At each iteration, communication and computation could be overlapped.  ... 
doi:10.2139/ssrn.1959924 fatcat:q2udfjavjncclouly5hzvbol3q

Portable, MPI-interoperable coarray fortran

Chaoran Yang, Wesley Bland, John Mellor-Crummey, Pavan Balaji
2014 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14  
• Build AM on top of MPI's send and receive routines! • hurt performance -cannot overlap communication with AM handlers! • hurt interoperability -could cause deadlock !  ...  • Build AM on top of MPI's send and receive routines! • hurt performance -cannot overlap communication with AM handlers! • hurt interoperability -could cause deadlock !  ... 
doi:10.1145/2555243.2555270 dblp:conf/ppopp/YangBMB14 fatcat:23cyifjo3nc3lpxftnuzp4tdhu

Portable, MPI-interoperable coarray fortran

Chaoran Yang, Wesley Bland, John Mellor-Crummey, Pavan Balaji
2014 SIGPLAN notices  
• Build AM on top of MPI's send and receive routines! • hurt performance -cannot overlap communication with AM handlers! • hurt interoperability -could cause deadlock !  ...  • Build AM on top of MPI's send and receive routines! • hurt performance -cannot overlap communication with AM handlers! • hurt interoperability -could cause deadlock !  ... 
doi:10.1145/2692916.2555270 fatcat:dd5puu447nanrd3fbwm34o6w6q

Optimizing Metacomputing with Communication-Computation Overlap [chapter]

Françoise Baude, Denis Caromel, Nathalie Furmento, David Sagnol
2001 Lecture Notes in Computer Science  
In this way, programmers will be able to express, at a very high level, opportunities to introduce an overlapping of communications with computation operations.  ...  LOCCS [8], a library for communication routines and computation.  ...  A basic idea is to overlap communication with computation, thus yielding to a pipeline effect regarding messages transmission.  ... 
doi:10.1007/3-540-44743-1_19 fatcat:asnocez43fbtzl7twwsjkzxfwq

Integrating State of the Art Compute, Communication, and Autotuning Strategies to Multiply the Performance of the Application Programm CPMD for Ab Initio Molecular Dynamics Simulations [article]

Tobias Klöffel, Gerald Mathias, Bernd Meyer
2020 arXiv   pre-print
MPI+OpenMP parallelization now overlaps computation and communication.  ...  Following the internal instrumentation of CPMD, all time critical routines have been revised to maximize the computational throughput and to minimize the communication overhead for optimal performance.  ...  The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE).  ... 
arXiv:2003.08477v1 fatcat:arngdqddszfpvobj7gxjsmshde

Optimizing user-level communication patterns on the Fujitsu AP3000

J. Dawson, P. Strazdins
1999 ICWC 99. IEEE Computer Society International Workshop on Cluster Computing  
In this paper, we present techniques and algorithms to improve the performance of various communication patterns on message-passing platforms where, for reasons of safety, user-level communications must  ...  These algorithms can not only minimize message copying but overlap the copying to/from the special memory with the actual transfer, enabling full bandwidth to be achieved.  ...  It has communication networks with characteristics shared by most other state-ofthe-art distributed memory computers, that is, high communication costs relative to floating point speed, and row or column  ... 
doi:10.1109/iwcc.1999.810814 dblp:conf/iwcc/DawsonS99 fatcat:4u3b572fmve4rg3tbhsdv6dpgm

Asynchronous Communication in Spectral Element and Discontinuous Galerkin Methods for Atmospheric Dynamics

Benjamin F. Jamroz, Robert Klöfkorn
2016 Geoscientific Model Development Discussions  
This allows the overlap of computation with communication effectively hiding some of the costs of communication.  ...  The scalability of computational applications on current and next generation supercomputers is increasingly limited by the cost of inter-process communication.  ...  To implement the overlap of pack/unpack routines with the communication itself we generated the following mapping.  ... 
doi:10.5194/gmd-2016-23 fatcat:4yfpvf5uhnc7jnxbkx5qbg7hkm


Chongxiao Cao, Jack Dongarra, Peng Du, Mark Gates, Piotr Luszczek, Stanimire Tomov
2014 Proceedings of the International Workshop on OpenCL 2013 & 2014 - IWOCL '14  
High performance is obtained through use of the high-performance OpenCL BLAS, hardware and OpenCL-specific tuning, and a hybridization methodology where we split the algorithm into computational tasks  ...  The LAPACK-compliance and use of OpenCL simplify the use of clMAGMA in applications, while providing them with portably performant DLA.  ...  Acknowledgments The authors would like to thank the National Science Foundation (award #0910735), the Department of Energy, and AMD for supporting this research effort.  ... 
doi:10.1145/2664666.2664667 dblp:conf/iwocl/CaoDDGLT14 fatcat:ghu4z4pjgvhmzm7u25pxsjywky

Overlapping Communication and Computation with OpenMP and MPI

Timothy H. Kaiser, Scott B. Baden
2001 Scientific Programming  
We show how coarse grain OpenMP parallelism can also be used to facilitate overlapping MPI communication and computation for stencil-based grid programs such as a program performing Gauss-Seidel iteration  ...  Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines.  ...  Acknowledgments This work was funded by the National Science Foundation's National Partnership for Advanced Computational Infrastructure (NPACI) program.  ... 
doi:10.1155/2001/712152 fatcat:vbzbllebrfcv5c6sqi4cxj4poe

Automatic translation of MPI source into a latency-tolerant, data-driven form

Tan Nguyen, Pietro Cicotti, Eric Bylaska, Dan Quinlan, Scott Baden
2017 Journal of Parallel and Distributed Computing  
. • Bamboo supports both point-to-point and collective communication. • Bamboo supports GPUs, hiding communication among GPUs and between hosts and GPUs. • Bamboo speeds up applications containing elaborate  ...  data and control structures.  ...  An olap-region is a section of code containing communication to be overlapped with computation.  ... 
doi:10.1016/j.jpdc.2017.02.009 fatcat:ddyuex5tkjewfp46zeao4njtpa

Exact Dependence Analysis for Increased Communication Overlap [chapter]

Simone Pellegrini, Torsten Hoefler, Thomas Fahringer
2012 Lecture Notes in Computer Science  
In this paper we revive the use of compiler analysis techniques to automatically unveil opportunities for communication/computation overlap using the result of exact data dependence analysis provided by  ...  However, for large applications, this is often not practical and expensive tracing tools and post-mortem analysis are employed to guide the tuning efforts finding hot-spots and performance bottlenecks.  ...  This research has been partially funded by the Austrian Research Promotion Agency under contract nr. 824925 (OpenCore) and under contract 834307 (AutoCore). Acknowledgments.  ... 
doi:10.1007/978-3-642-33518-1_14 fatcat:c7scn6yjtndezoybq264qpnqcq
« Previous Showing results 1 — 15 out of 151,191 results