A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatic Transformation for Overlapping Communication and Computation
[chapter]
2008
Lecture Notes in Computer Science
RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. ...
For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. ...
. 11 Communication and computation available for overlapping in class A and class B problem size (IS benchmark) transformation algorithm achieves good performance only if the time taken in communication ...
doi:10.1007/978-3-540-88140-7_19
fatcat:7tqkwyvtmnbrxc7dpuayj6bb2e
Automatic MPI application transformation with ASPhALT
2007
2007 IEEE International Parallel and Distributed Processing Symposium
This tool is able to automatically apply a "prepushing" transformation that causes MPI programs to aggressively send data as soon as it is available, thus improving communicationcomputation overlap and ...
In this paper we present asphalt transformer; the Open64-based component of our framework, ASPhALT, responsible for automatically performing the prepushing transformation. ...
Acknowledgments We would like to thank the University of Tennessee and professor Dionisios G. Vlachos at the University of Delaware for providing us with access to their clusters. ...
doi:10.1109/ipdps.2007.370486
dblp:conf/ipps/DanalisPS07
fatcat:ll36g25sbravjhf3x2qxhpqr7q
Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations
2013
2013 42nd International Conference on Parallel Processing
overlap; and selection of the appropriate communication operators based on the cache-coherence support of the underlying platform. ...
We use our annotation-based approach to optimize several benchmark kernels, and we demonstrate that the framework is effective at automatically improving performance portability for MPI applications. ...
ACKNOWLEDGMENTS This work was supported through resource grants from the Argonne Leadership Computing Facility, the Argonne Laboratory Computing Resource Center, the Oak Ridge National Center for Computational ...
doi:10.1109/icpp.2013.77
dblp:conf/icpp/HaqueYDB13
fatcat:t432ks4ahba25byi2sshbz777y
MPI-aware compiler optimizations for improving communication-computation overlap
2009
Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. ...
to achieve improved communication-computation overlap. ...
the corresponding calls to libraries specialized for communication-computation overlap [10] . ...
doi:10.1145/1542275.1542321
dblp:conf/ics/DanalisPSC09
fatcat:sgro5qzqevdfzk72vomlvwaz34
Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads
[article]
2022
arXiv
pre-print
Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication ...
Manually applying these optimizations needs modifications in underlying computation and communication libraries for each scenario, which is time consuming and error-prone. ...
Overlapping Computation and Communication CoCoNet provides overlap transformation to overlap a series of producer-consumer operations. ...
arXiv:2105.05720v5
fatcat:qg5o27bgljbi3eyvthreygrzgu
An automated approach to improve communication-computation overlap in clusters
2006
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium
Unfortunately, the trade-off between maintainability and performance often leads to a structure that prevents exploiting the potential for communication computation overlapping. ...
This paper describes a sourceto-source optimizing transformation that can be performed by an automatic (or semi-automatic) system in order to restructure MPI codes towards maximizing communication-computation ...
communication-computation overlap. ...
doi:10.1109/ipdps.2006.1639590
dblp:conf/ipps/FishgoldDPS06
fatcat:c32yae6amffnbhel4crzqwihki
Leveraging non-blocking collective communication in high-performance applications
2008
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures - SPAA '08
Common communication and computation patterns in iterative SPMD computations are used to motivate the transformations we present. ...
Although overlapping communication with computation is an important mechanism for achieving high performance in parallel programs, developing applications that actually achieve good overlap can be difficult ...
Acknowledgments The authors thank Douglas Gregor for many helpful comments. ...
doi:10.1145/1378533.1378554
dblp:conf/spaa/HoeflerGL08
fatcat:v4lgfa6j6rhrra7dr77jap5ali
An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs
2011
2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing
The move towards heterogeneous parallel computing is underway as witnessed by the emergence of novel computing platforms combining architecturally diverse components such as CPUs, GPUs and special function ...
In this paper, we present an approach for exploiting coarse-grain pipeline parallelism exposed by a dataflow graph and describe its mapping onto CPU-GPU architecture. ...
An alternative approach to obtain computation and communication overlap is to exploit dataflow models of computation. ...
doi:10.1109/dfm.2011.10
fatcat:uj75oy3rs5ftjagajms5ujmlrq
Towards effective automatic parallelization for multicore systems
2008
Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)
In this paper we describe our recent efforts towards developing an effective automatic parallelization system that uses a polyhedral model for data dependences and program transformations. ...
The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use. ...
Acknowledgments We would like to acknowledge Cédric Bastoul and other contributors to the CLooG code generator and Martin Griebl and team for the LooPo infrastructure. ...
doi:10.1109/ipdps.2008.4536401
dblp:conf/ipps/BondhugulaBHKRRS08
fatcat:gv2yaercm5dp7awfy7buz2pvte
Scaling Data-Intensive Applications on Heterogeneous Platforms with Accelerators
2012
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Tiling + Streaming = TStream -Stage I: Compiler transforms for data partitioning -Tiling in polyhedral model -I/O tile bounds + footprint computation -Stage II: Support for tile streaming -Communication ...
Parallelization Approaches -• Asynchronous producer-transformer-consumer processes, implemented by helper threads executing on CPU and GPU -Transformer process (GPU) executes (automatically) parallelized ...
doi:10.1109/ipdpsw.2012.230
dblp:conf/ipps/BalevicK12
fatcat:w46lyu4cf5gpfj6eeym5xhm57y
Toward adjoinable MPI
2009
2009 IEEE International Symposium on Parallel & Distributed Processing
P2
405 into the index regions (labeled “east overlap” and “west overlap”). ...
1
31 Automatic differentiation is a technique for computing the analytic deriva-
32 tives of numerical functions given as computer programs. ...
doi:10.1109/ipdps.2009.5161165
dblp:conf/ipps/UtkeHHHHN09
fatcat:hdwf743q3bgonfwxm4yytpef2a
KelpIO: a telescope-ready domain-specific I/O library for irregular block-structured applications
2002
Future generations computer systems
, and describes how a high-level domain-specific optimizer for applying these transformations could be constructed using the telescoping languages framework. ...
The paper describes a domain-specific I/O library for irregular block-structured applications based on the KeLP library, describes high-level transformations of the library primitives for improving performance ...
, using file system write-behind to overlap I/O latency and computation. ...
doi:10.1016/s0167-739x(01)00072-3
fatcat:mnmwrqio2bfdxjpepl4pztxvjm
Overlapping Communication and Computation with High Level Communication Routines
2008
2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID)
A real-world quantum-mechanical application is used as a deployment and evaluation vehicle for our approach. 1 or lower level BLAS operations ...
We use a well-understood network model to found our theoretical analyses and we realize our communication operations as a portable library layered on MPI. ...
Automatic and semi-automatic transformations to parallel codes to enable overlapping of point-to-point communication have been proposed in many studies. ...
doi:10.1109/ccgrid.2008.15
dblp:conf/ccgrid/HoeflerL08
fatcat:5awb5i3w5vba5luutl4p475nkq
Gravel: A Communication Library to Fast Path MPI
[chapter]
2008
Lecture Notes in Computer Science
This capability enables communication-computation overlapping, which is highly desirable for addressing the costly communication overhead in cluster computing. ...
Gravel works in concert with MPI to achieve increased communication-computation overlap by separating the meta-data exchange from the application data exchange, thus allowing different communication protocols ...
To exploit the RDMA for communication-computation overlap, the communication library must provide support for one-sided communication and two-sided communication with lowoverhead rendezvous protocols, ...
doi:10.1007/978-3-540-87475-1_19
fatcat:adnqh2tc3ffzdnmtv247tck75i
Exact Dependence Analysis for Increased Communication Overlap
[chapter]
2012
Lecture Notes in Computer Science
In this paper we revive the use of compiler analysis techniques to automatically unveil opportunities for communication/computation overlap using the result of exact data dependence analysis provided by ...
However, for large applications, this is often not practical and expensive tracing tools and post-mortem analysis are employed to guide the tuning efforts finding hot-spots and performance bottlenecks. ...
This research has been partially funded by the Austrian Research Promotion Agency under contract nr. 824925 (OpenCore) and under contract 834307 (AutoCore). Acknowledgments. ...
doi:10.1007/978-3-642-33518-1_14
fatcat:c7scn6yjtndezoybq264qpnqcq
« Previous
Showing results 1 — 15 out of 153,922 results