153,922 Hits in 4.2 sec

Automatic Transformation for Overlapping Communication and Computation [chapter]

Changjun Hu, Yewei Shao, Jue Wang, Jianjiang Li
2008 Lecture Notes in Computer Science  
RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation.  ...  For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes.  ...  . 11 Communication and computation available for overlapping in class A and class B problem size (IS benchmark) transformation algorithm achieves good performance only if the time taken in communication  ... 
doi:10.1007/978-3-540-88140-7_19 fatcat:7tqkwyvtmnbrxc7dpuayj6bb2e

Automatic MPI application transformation with ASPhALT

Anthony Danalis, Lori Pollock, Martin Swany
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
This tool is able to automatically apply a "prepushing" transformation that causes MPI programs to aggressively send data as soon as it is available, thus improving communicationcomputation overlap and  ...  In this paper we present asphalt transformer; the Open64-based component of our framework, ASPhALT, responsible for automatically performing the prepushing transformation.  ...  Acknowledgments We would like to thank the University of Tennessee and professor Dionisios G. Vlachos at the University of Delaware for providing us with access to their clusters.  ... 
doi:10.1109/ipdps.2007.370486 dblp:conf/ipps/DanalisPS07 fatcat:ll36g25sbravjhf3x2qxhpqr7q

Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations

Md. Ziaul Haque, Qing Yi, James Dinan, Pavan Balaji
2013 2013 42nd International Conference on Parallel Processing  
overlap; and selection of the appropriate communication operators based on the cache-coherence support of the underlying platform.  ...  We use our annotation-based approach to optimize several benchmark kernels, and we demonstrate that the framework is effective at automatically improving performance portability for MPI applications.  ...  ACKNOWLEDGMENTS This work was supported through resource grants from the Argonne Leadership Computing Facility, the Argonne Laboratory Computing Resource Center, the Oak Ridge National Center for Computational  ... 
doi:10.1109/icpp.2013.77 dblp:conf/icpp/HaqueYDB13 fatcat:t432ks4ahba25byi2sshbz777y

MPI-aware compiler optimizations for improving communication-computation overlap

Anthony Danalis, Lori Pollock, Martin Swany, John Cavazos
2009 Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09  
Several existing compiler transformations can help improve communication-computation overlap in MPI applications.  ...  to achieve improved communication-computation overlap.  ...  the corresponding calls to libraries specialized for communication-computation overlap [10] .  ... 
doi:10.1145/1542275.1542321 dblp:conf/ics/DanalisPSC09 fatcat:sgro5qzqevdfzk72vomlvwaz34

Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads [article]

Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi
2022 arXiv   pre-print
Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication  ...  Manually applying these optimizations needs modifications in underlying computation and communication libraries for each scenario, which is time consuming and error-prone.  ...  Overlapping Computation and Communication CoCoNet provides overlap transformation to overlap a series of producer-consumer operations.  ... 
arXiv:2105.05720v5 fatcat:qg5o27bgljbi3eyvthreygrzgu

An automated approach to improve communication-computation overlap in clusters

L. Fishgold, A. Danalis, L. Pollock, M. Swany
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
Unfortunately, the trade-off between maintainability and performance often leads to a structure that prevents exploiting the potential for communication computation overlapping.  ...  This paper describes a sourceto-source optimizing transformation that can be performed by an automatic (or semi-automatic) system in order to restructure MPI codes towards maximizing communication-computation  ...  communication-computation overlap.  ... 
doi:10.1109/ipdps.2006.1639590 dblp:conf/ipps/FishgoldDPS06 fatcat:c32yae6amffnbhel4crzqwihki

Leveraging non-blocking collective communication in high-performance applications

Torsten Hoefler, Peter Gottschling, Andrew Lumsdaine
2008 Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures - SPAA '08  
Common communication and computation patterns in iterative SPMD computations are used to motivate the transformations we present.  ...  Although overlapping communication with computation is an important mechanism for achieving high performance in parallel programs, developing applications that actually achieve good overlap can be difficult  ...  Acknowledgments The authors thank Douglas Gregor for many helpful comments.  ... 
doi:10.1145/1378533.1378554 dblp:conf/spaa/HoeflerGL08 fatcat:v4lgfa6j6rhrra7dr77jap5ali

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs

Ana Balevic, Bart Kienhuis
2011 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing  
The move towards heterogeneous parallel computing is underway as witnessed by the emergence of novel computing platforms combining architecturally diverse components such as CPUs, GPUs and special function  ...  In this paper, we present an approach for exploiting coarse-grain pipeline parallelism exposed by a dataflow graph and describe its mapping onto CPU-GPU architecture.  ...  An alternative approach to obtain computation and communication overlap is to exploit dataflow models of computation.  ... 
doi:10.1109/dfm.2011.10 fatcat:uj75oy3rs5ftjagajms5ujmlrq

Towards effective automatic parallelization for multicore systems

Uday Bondhugula, Muthu Baskaran, Albert Hartono, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
In this paper we describe our recent efforts towards developing an effective automatic parallelization system that uses a polyhedral model for data dependences and program transformations.  ...  The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use.  ...  Acknowledgments We would like to acknowledge Cédric Bastoul and other contributors to the CLooG code generator and Martin Griebl and team for the LooPo infrastructure.  ... 
doi:10.1109/ipdps.2008.4536401 dblp:conf/ipps/BondhugulaBHKRRS08 fatcat:gv2yaercm5dp7awfy7buz2pvte

Scaling Data-Intensive Applications on Heterogeneous Platforms with Accelerators

Ana Balevic, Bart Kienhuis
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum  
Tiling + Streaming = TStream -Stage I: Compiler transforms for data partitioning -Tiling in polyhedral model -I/O tile bounds + footprint computation -Stage II: Support for tile streaming -Communication  ...  Parallelization Approaches -• Asynchronous producer-transformer-consumer processes, implemented by helper threads executing on CPU and GPU -Transformer process (GPU) executes (automatically) parallelized  ... 
doi:10.1109/ipdpsw.2012.230 dblp:conf/ipps/BalevicK12 fatcat:w46lyu4cf5gpfj6eeym5xhm57y

Toward adjoinable MPI

Jean Utke, Laurent Hascoet, Patrick Heimbach, Chris Hill, Paul Hovland, Uwe Naumann
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
P2 405 into the index regions (labeled “east overlapand “west overlap”).  ...  1 31 Automatic differentiation is a technique for computing the analytic deriva- 32 tives of numerical functions given as computer programs.  ... 
doi:10.1109/ipdps.2009.5161165 dblp:conf/ipps/UtkeHHHHN09 fatcat:hdwf743q3bgonfwxm4yytpef2a

KelpIO: a telescope-ready domain-specific I/O library for irregular block-structured applications

Bradley Broom, Rob Fowler, Ken Kennedy
2002 Future generations computer systems  
, and describes how a high-level domain-specific optimizer for applying these transformations could be constructed using the telescoping languages framework.  ...  The paper describes a domain-specific I/O library for irregular block-structured applications based on the KeLP library, describes high-level transformations of the library primitives for improving performance  ...  , using file system write-behind to overlap I/O latency and computation.  ... 
doi:10.1016/s0167-739x(01)00072-3 fatcat:mnmwrqio2bfdxjpepl4pztxvjm

Overlapping Communication and Computation with High Level Communication Routines

Torsten Hoefler, Andrew Lumsdaine
2008 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)  
A real-world quantum-mechanical application is used as a deployment and evaluation vehicle for our approach. 1 or lower level BLAS operations  ...  We use a well-understood network model to found our theoretical analyses and we realize our communication operations as a portable library layered on MPI.  ...  Automatic and semi-automatic transformations to parallel codes to enable overlapping of point-to-point communication have been proposed in many studies.  ... 
doi:10.1109/ccgrid.2008.15 dblp:conf/ccgrid/HoeflerL08 fatcat:5awb5i3w5vba5luutl4p475nkq

Gravel: A Communication Library to Fast Path MPI [chapter]

Anthony Danalis, Aaron Brown, Lori Pollock, Martin Swany, John Cavazos
2008 Lecture Notes in Computer Science  
This capability enables communication-computation overlapping, which is highly desirable for addressing the costly communication overhead in cluster computing.  ...  Gravel works in concert with MPI to achieve increased communication-computation overlap by separating the meta-data exchange from the application data exchange, thus allowing different communication protocols  ...  To exploit the RDMA for communication-computation overlap, the communication library must provide support for one-sided communication and two-sided communication with lowoverhead rendezvous protocols,  ... 
doi:10.1007/978-3-540-87475-1_19 fatcat:adnqh2tc3ffzdnmtv247tck75i

Exact Dependence Analysis for Increased Communication Overlap [chapter]

Simone Pellegrini, Torsten Hoefler, Thomas Fahringer
2012 Lecture Notes in Computer Science  
In this paper we revive the use of compiler analysis techniques to automatically unveil opportunities for communication/computation overlap using the result of exact data dependence analysis provided by  ...  However, for large applications, this is often not practical and expensive tracing tools and post-mortem analysis are employed to guide the tuning efforts finding hot-spots and performance bottlenecks.  ...  This research has been partially funded by the Austrian Research Promotion Agency under contract nr. 824925 (OpenCore) and under contract 834307 (AutoCore). Acknowledgments.  ... 
doi:10.1007/978-3-642-33518-1_14 fatcat:c7scn6yjtndezoybq264qpnqcq
« Previous Showing results 1 — 15 out of 153,922 results