Filters








915 Hits in 3.6 sec

High performance RDMA-based MPI implementation over InfiniBand

Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, Dhabaleswar K. Panda
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%  ...  Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second.  ...  Our appreciation is also extended to Jeff Kirk, Ezra Silvera and Kevin Deierling from Mellanox Technologies for their insight and technical support on their InfiniBand hardware and software.  ... 
doi:10.1145/782852.782855 fatcat:27bvkzfuazejnevkiiht56krtq

High performance RDMA-based MPI implementation over InfiniBand

Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, Dhabaleswar K. Panda
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%  ...  Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second.  ...  Our appreciation is also extended to Jeff Kirk, Ezra Silvera and Kevin Deierling from Mellanox Technologies for their insight and technical support on their InfiniBand hardware and software.  ... 
doi:10.1145/782814.782855 dblp:conf/ics/LiuWKWP03 fatcat:7swpfrsn6jafvgqxyn2vr7kdae

High Performance RDMA-Based MPI Implementation over InfiniBand

Jiuxing Liu, Jiesheng Wu, Dhabaleswar K. Panda
2004 International journal of parallel programming  
Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%  ...  Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second.  ...  Our appreciation is also extended to Jeff Kirk, Ezra Silvera and Kevin Deierling from Mellanox Technologies for their insight and technical support on their InfiniBand hardware and software.  ... 
doi:10.1023/b:ijpp.0000029272.69895.c1 fatcat:zbez3q6p7zczpkhsrr6qq6avya

Design and Implementation of MPICH2 over InfiniBand with RDMA Support [article]

Jiuxing Liu, Weihang Jiang, Pete Wyckoff, Dhabaleswar K. Panda, David Ashton, Darius Buntinas, William Gropp, Brian Toonen
2003 arXiv   pre-print
To the best of our knowledge, this is the first high-performance design and implementation of MPICH2 on InfiniBand using RDMA support.  ...  In this paper, we present our experiences designing and implementing MPICH2 over InfiniBand.  ...  Designing and Optimizing MPICH2 over InfiniBand In this section, we present several different designs of MPICH2 over InfiniBand based on the RDMA Channel interface.  ... 
arXiv:cs/0310059v1 fatcat:tkaemmkasjbcteqgxl7veggily

Design Alternatives and Performance Trade-Offs for Implementing MPI-2 over InfiniBand [chapter]

Wei Huang, Gopalakrishnan Santhanaraman, Hyun-Wook Jin, Dhabaleswar K. Panda
2005 Lecture Notes in Computer Science  
, being capable to access more performance oriented features ADI3 ADI3 CH3 CH3 InfiniBand InfiniBand MPI-2 MPI-2  ...  ADI3 ADI3 CH3 CH3 RDMA Channel RDMA Channel InfiniBand InfiniBand MPI-2 MPI-2 CH3 layer • A more complex channel device -Responsible to make communication progress -More flexible than RDMA channel layer  ...  • Design and ImplementationPerformance Evaluation • Conclusion and Future work Design and Implementation • Designed and implemented MPI-2 over InfiniBand based on -RDMA channel -CH3 -ADI  ... 
doi:10.1007/11557265_27 fatcat:fv6je2ubqfbctlahbdamczfmq4

High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth performance Analysis

Sayantan Sur, Matthew . Koop, Dhabaleswar Panda
2006 ACM/IEEE SC 2006 Conference (SC'06)  
Design and implementation of high performance collectives using RDMA • Designed and implemented a hybrid mechanism for an efficient RDMA based Alltoall and Allgather over InfiniBand.  ...  "High Performance Broadcast Support in LA-MPI over Quadrics", International Journal of High Performance Computer Applications 2005.  ... 
doi:10.1109/sc.2006.34 fatcat:p2ctyxk4rjg73f43qelc4ndjsm

Efficient and scalable all-to-all personalized exchange for InfiniBand-based clusters

S. Sur, H.-W. Jin, D.K. Panda
2004 International Conference on Parallel Processing, 2004. ICPP 2004.  
It offers very low latency, high bandwidth and one-sided operations like RDMA write.  ...  Performance evaluation of our design and implementation reveals that it is able to reduce the All-to-All communication time by upto a factor of 3.07 for 32 byte messages on a 16 node InfiniBand cluster  ...  The designs for MPI Alltoall were implemented and integrated into the MVAPICH [10] implementation of MPI over InfiniBand.  ... 
doi:10.1109/icpp.2004.1327932 dblp:conf/icpp/SurJP04 fatcat:bwlgje7pabf6bc3wxhqb4lcnua

High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters [chapter]

S. Sur, U. K. R. Bondhugula, A. Mamidala, H. -W. Jin, D. K. Panda
2005 Lecture Notes in Computer Science  
In this paper, we propose a design of All-to-All broadcast using the Remote Direct Memory Access (RDMA) feature offered by InfiniBand, an emerging high performance interconnect.  ...  Contemporary MPI software stacks implement this collective on top of MPI point-to-point calls leading to several performance overheads.  ...  Our design reduces software overhead, copy costs, protocol handshake -all required by the implementation of collectives over MPI point-to-point.  ... 
doi:10.1007/11602569_19 fatcat:sstszogigffkhccdujssk2pqs4

High Performance Broadcast Support in La-Mpi Over Quadrics

Weikuan Yu, Sayantan Sur, Dhabaleswar K. Panda, Rob T. Aulwes, Rich L. Graham
2005 The international journal of high performance computing applications  
implementation of high performance collectives using RDMA. • Optimized shared memory message passing over architectures like shared memory bus and NUMA.  ...  Open MPI • Design and implementation of point-to-point transport layer (PTL) over InfiniBand VAPI.  ...  implementation of high performance collectives using RDMA. • Optimized shared memory message passing over architectures like shared memory bus and NUMA.  ... 
doi:10.1177/1094342005056145 fatcat:5aenumeoufatjcghx3vwplapca

Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet

Gurkirat Kaur
2013 IOSR Journal of Computer Engineering  
InfiniBand is a well known technology, which provides high-bandwidth and lowlatency and makes optimal use of in-built features like RDMA.  ...  This paper presents the heterogeneous Linux cluster configuration & evaluates its performance using Intel's MPI Benchmarks.  ...  interconnects & MPI Implementation, InfiniBand and RoCE, RDMA protocol.  ... 
doi:10.9790/0661-1548187 fatcat:pcdlvrptlnfabpwuo2ry6de2se

Zero-copy protocol for MPI using infiniband unreliable datagram

Matthew J. Koop, Sayantan Sur, Dhabaleswar K. Panda
2007 2007 IEEE International Conference on Cluster Computing  
In order to eliminate message copies while transferring large messages, MPI libraries over InfiniBand employ "zero-copy" protocols which use Remote Direct Memory Access (RDMA).  ...  State of the art implementations of message-passing libraries, such as MPI, utilize user-level networking protocols to reduce or eliminate memory copies.  ...  in a scalable, high-performance MPI design and implementation for InfiniBand.  ... 
doi:10.1109/clustr.2007.4629230 dblp:conf/cluster/KoopSP07 fatcat:ru5wk73tcncgdcsqgotw5f55ii

RDMA read based rendezvous protocol for MPI over InfiniBand

Sayantan Sur, Hyun-Wook Jin, Lei Chai, Dhabaleswar K. Panda
2006 Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '06  
Most high-performance MPI implementations use Rendezvous Protocol for efficient transfer of large messages. This protocol can be designed using either RDMA Write or RDMA Read.  ...  On the other hand, to achieve low latency, MPI implementations often provide a polling based progress engine.  ...  Conclusions and Future Work In this paper, we have presented new designs which exploit the RDMA Read and the capability of generating selective interrupts to implement a high-performance Rendezvous Protocol  ... 
doi:10.1145/1122971.1122978 dblp:conf/ppopp/SurJCP06 fatcat:mj56if6ozvei3fkaz54nzzwvta

MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?

Lei Chai, R. Noronha, D.K. Panda
2006 Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)  
It also delivers the same good performance as MPI implemented over native APIs of the underlying interconnect.  ...  Experimental results on Solaris show that the multi-stream design can improve bandwidth over InfiniBand by 30%, and improve the application performance by up to 11%.  ...  MVA-PICH2 [21] is an MPI-2 implementation over InfiniBand, which is also on top of VAPI. It was implemented based on MPICH2 [8] RDMA channel.  ... 
doi:10.1109/ccgrid.2006.70 dblp:conf/ccgrid/ChaiNP06 fatcat:kkoovr2mzzgnlcmingcsycluhi

TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand

M.J. Koop, J.K. Sridhar, D.K. Panda
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Unfortunately, even on offloaded hardware such as InfiniBand, performance is not improved since the underlying protocols within MPI implementation require control messages that prevent overlap without  ...  We also show a 27% improvement for NAS SP using our design over the existing designs.  ...  Additionally, we do not use RDMA operations, which all other MPI libraries over InfiniBand use to implement large message transfer.  ... 
doi:10.1109/ipdps.2009.5161056 dblp:conf/ipps/KoopSP09 fatcat:o33lmxvkf5cgzej22muedkl5du

Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand [chapter]

Ying Qian, Ahmad Afsahi
2009 Lecture Notes in Computer Science  
In this paper, we propose novel RDMA-based process arrival pattern aware MPI Alltoall() algorithms over InfiniBand clusters.  ...  Its efficient implementation under different process arrival patterns is critical to the performance of applications that use them frequently.  ...  MPI implementations over RDMA-enabled networks such as InfiniBand are able to effectively bypass the operating system overhead and lower the CPU utilization.  ... 
doi:10.1007/978-3-642-03770-2_31 fatcat:rswk2imddvgjlj2kuqvt3jtrui
« Previous Showing results 1 — 15 out of 915 results