Filters








1,080 Hits in 5.8 sec

Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

Ron Brightwell, Rolf Riesen, Keith D. Underwood
2005 The international journal of high performance computing applications  
Similarly, the ability of the Message Passing Interface (MPI) to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed  ...  The overlap of computation and communication has long been considered to be a significant performance benefit for applications.  ...  His research interests include high performance, scalable communication interfaces and protocols for system area networks, operating systems for massively parallel processing machines, and parallel program  ... 
doi:10.1177/1094342005054257 fatcat:rvl462jpp5bmtewtoig6aspdhy

An analysis of the impact of MPI overlap and independent progress

Ron Brightwell, Keith D. Underwood
2004 Proceedings of the 18th annual international conference on Supercomputing - ICS '04  
The overlap of computation and communication has long been considered to be a significant performance benefit for applications.  ...  Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied  ...  BACKGROUND This paper seeks to quantify the impact of overlap, independent progress, and offload from an application perspective.  ... 
doi:10.1145/1006209.1006251 dblp:conf/ics/BrightwellU04 fatcat:7zhzezv7svcjxpplsfapeludji

Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects

Mohammad J. Rashti, Ahmad Afsahi
2007 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007)  
Independent progress support in messaging layer, network interface offload capability and application usage of non-blocking communications are believed to increase overlap and yield performance benefits  ...  On the other hand, in most cases, transferring large messages does not make progress independently, decreasing the chances of overlap in applications. transfer to complete using MPI_Wait().  ...  This research is supported by the Natural Sciences and Engineering Research Council of Canada through grant RGPIN/238964-2005, Canada Foundation for Innovation's grant #7154, and Ontario Innovation Trust's  ... 
doi:10.1109/hoti.2007.12 dblp:conf/hoti/RashtiA07 fatcat:jmsj2gm2nbdgtnsg5lvu2nnrxy

Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects

Mohammad J. Rashti, Ahmad Afsahi
2007 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007)  
Independent progress support in messaging layer, network interface offload capability and application usage of non-blocking communications are believed to increase overlap and yield performance benefits  ...  On the other hand, in most cases, transferring large messages does not make progress independently, decreasing the chances of overlap in applications. transfer to complete using MPI_Wait().  ...  This research is supported by the Natural Sciences and Engineering Research Council of Canada through grant RGPIN/238964-2005, Canada Foundation for Innovation's grant #7154, and Ontario Innovation Trust's  ... 
doi:10.1109/hoti.2007.4296815 fatcat:t5sb7trsuba3zpsuilgbulwjbu

Communication-Aware Hardware-Assisted MPI Overlap Engine [chapter]

Mohammadreza Bayatpour, Jahanzeb Hashmi Maqbool, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda
2020 Lecture Notes in Computer Science  
Overlap of computation and communication is critical for good application-level performance.  ...  We evaluate the proposed designs against state-of-the-art MPI libraries and show up to 41% and 22% reduction in latency for collective operations and stencil-based application kernels on 1024 and 128 nodes  ...  The Message Passing Interface (MPI) [18] has been the defacto programming model for developing high-performance parallel applications for the last couple of decades.  ... 
doi:10.1007/978-3-030-50743-5_26 fatcat:tsudc4aw7jhvbd77tsh42zxu54

A preliminary analysis of the InfiniPath and XD1 network interfaces

R. Brightwell, D. Doerfler, K.D. Underwood
2006 Proceedings 20th IEEE International Parallel & Distributed Processing Symposium  
This approach stands in stark contrast to the current direction of most high-performance networking activities, which is to offload as much protocol processing as possible to the network interface.  ...  Another fundamental difference between these networks and other modern network adapters is that much of the processing needed for the network protocol stack is performed on the host processor(s) rather  ...  The XD1 machine is a resource of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S.  ... 
doi:10.1109/ipdps.2006.1639568 dblp:conf/ipps/BrightwellDU06 fatcat:bewvypammnehfdlinqtwnn2gfu

The impact of MPI queue usage on message latency

K.D. Underwood, R. Brightwell
2004 International Conference on Parallel Processing, 2004. ICPP 2004.  
For example, traditional MPI latency benchmarks time a ping-pong communication with one send and one receive on each of two nodes.  ...  The time to post the receive is never counted as part of the latency. This scenario is not even marginally representative of most applications.  ...  This allows MPI to customize the queue traversal and matching operations to match its needs. The penalty is that features such as independent progress and matching offload are lost.  ... 
doi:10.1109/icpp.2004.1327915 dblp:conf/icpp/UnderwoodB04 fatcat:bolzea6b65gn3acuthtxyopl3e

Improving Communication Progress and Overlap in MPI Rendezvous Protocol over RDMA-enabled Interconnects

Mohammad J. Rashti, Ahmad Afsahi
2008 International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06)  
MPI is a widely used message passing standard for high performance computing.  ...  In this paper, we address some of the communication progress shortcomings in the current polling and RDMA Read based Rendezvous protocol used for transferring large messages in MPI.  ...  Most scientific applications running on clusters are written on top of Message Passing Interface (MPI) [12] .  ... 
doi:10.1109/hpcs.2008.10 dblp:conf/hpcs/RashtiA08 fatcat:g7uqtxv6pjhnhoqude4gjf4heq

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers

K. Kandalla, U. Yang, J. Keasler, T. Kolev, A. Moody, H. Subramoni, K. Tomko, J. Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium  
The latest InfiniBand adapter from Mellanox, ConnectX-2, enables offloading of generalized lists of communication operations to the network interface.  ...  In this paper, we design fully functional, scalable algorithms for the MPI Iallreduce operation, based on the network offload technology.  ...  The Message Passing Interface (MPI) [1] has been a popular programming model for High Performance Computing applications for the last couple of decades.  ... 
doi:10.1109/ipdps.2012.106 dblp:conf/ipps/KandallaYKKMSTVSP12 fatcat:knzkqeevsjcuppyk47p4jwrj5a

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen, JiDong Zhai, Jin Zhang, WeiMin Zheng
2009 Science in China Series F Information Sciences  
Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems.  ...  The potential overlap degree of computation and communication can have great impact on communication performance of MPI programs.  ...  Introduction Message passing interface (MPI) [1] is the de facto standard in writing parallel scientific applications on distributed memory systems.  ... 
doi:10.1007/s11432-009-0161-2 fatcat:tzrhi7lgkzcgxeyfoinxeag32q

Implications of application usage characteristics for collective communication offload

Ron Brightwell, Sue P. Goudy, Arun Rodrigues, Keith D. Underwood
2006 International Journal of High Performance Computing and Networking  
We analyze network resource usage data in order to guide the design of collective offload engines and their associated programming interfaces.  ...  In this paper, we describe several characteristics of applications and application benchmarks that impact collective communication performance.  ...  In general, the NPB suite proves to be a completely unsuitable environment for developing offload engines and analyzing collectives.  ... 
doi:10.1504/ijhpcn.2006.010633 fatcat:yahgbeholzg7bk3xnzi2npuxya

Kernel-Based Offload of Collective Operations – Implementation, Evaluation and Lessons Learned [chapter]

Timo Schneider, Sven Eckelmann, Torsten Hoefler, Wolfgang Rehm
2011 Lecture Notes in Computer Science  
Introduction The Message Passing Interface (MPI) standard [12] is the de-facto standard for implementing today's large-scale high-performance applications.  ...  Hoefler and Lumsdaine analyzed in [5] different schemes to progress the communication subsystem.  ...  Acknowledgments This work was supported in part by the DOE Office of Science, Advanced Scientific Computing Research X-Stack Program.  ... 
doi:10.1007/978-3-642-23397-5_26 fatcat:yo2hvna3ybaedobe7f3k4rjhwm

ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures

P. Lai, P. Balaji, R. Thakur, D. K. Panda
2009 Computer Science - Research and Development  
in terms of overlap of computation and communication and improved application performance.  ...  However, such hardware units are expensive, and their manufacturing complexity increases exponentially depending on the number and complexity of tasks they offload.  ...  Brightwell et al. analyzed the impact of communication and computation overlap [15] providing theoretical insights into this problem.  ... 
doi:10.1007/s00450-009-0090-8 fatcat:j7u4d6luyre4vhl3hz32zvt25y

Assessing MPI Performance on QsNet II [chapter]

Pablo E. García, Juan Fernández, Fabrizio Petrini, José M. García
2005 Lecture Notes in Computer Science  
In this case, while Myrinet, Infiniband and Quadrics are the preferred choices for the cluster interconnect, MPI [10] is the de facto standard communication library for message-passing, and Linux is the  ...  In particular, we study the raw network performance, the ability of MPI to overlap computation and communication, and the appropriateness of the local operating systems to support parallel processing.  ...  Future work include scalability analysis for larger configurations, performance analysis for different traffic patterns, and study of different scenarios to offload protocol processing to the Elan4 NIC  ... 
doi:10.1007/11557265_51 fatcat:n3f63izprfau3be3j3dh662mma

Asynchronous MPI for the Masses [article]

Markus Wittmann and Georg Hager and Thomas Zeiser and Gerhard Wellein
2013 arXiv   pre-print
It utilizes the MPI profiling interface (PMPI) and the MPI_THREAD_MULTIPLE thread compatibility level, and works with current versions of Intel MPI, Open MPI, MPICH2, MVAPICH2, Cray MPI, and IBM MPI.  ...  We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure.  ...  In [5] HOEFLER et al. analyze the impact of progress threads for non-blocking collectives inside their own reduced MPI implementation.  ... 
arXiv:1302.4280v1 fatcat:m24cftmp7zefbefws42geefxwy
« Previous Showing results 1 — 15 out of 1,080 results