Filters








494 Hits in 4.8 sec

Performance Expectations and Guidelines for MPI Derived Datatypes [chapter]

William Gropp, Torsten Hoefler, Rajeev Thakur, Jesper Larsson Träff
2011 Lecture Notes in Computer Science  
Such guidelines make performance expectations for derived datatypes explicit and suggest relevant optimizations to MPI implementers.  ...  MPI's derived datatypes provide a powerful mechanism for concisely describing arbitrary, noncontiguous layouts of user data for use in MPI communication.  ...  Department of Energy, under contract DE-AC02-06CH11357 and DE-FG02-08ER25835, and by the Blue Waters sustained-petascale  ... 
doi:10.1007/978-3-642-24449-0_18 fatcat:ei33led43fgsroc3xicsgbhbfi

MPI Derived Datatypes: Performance Expectations and Status Quo [article]

Alexandra Carpen-Amarie and Sascha Hunold and Jesper Larsson Träff
2016 arXiv   pre-print
We examine natural expectations on communication performance using MPI derived datatypes in comparison to the baseline, "raw" performance of communicating simple, non-contiguous data layouts.  ...  First, the performance of derived datatypes is sometimes worse than the semantically equivalent packing and unpacking using the corresponding MPI functionality.  ...  We also observe that the communication performance with (non-trivial) derived datatypes is quite different between the libraries.  ... 
arXiv:1607.00178v1 fatcat:zfgopgd5onhtjpc7kl32icjpai

Micro-applications for Communication Data Access Patterns and MPI Datatypes [chapter]

Timo Schneider, Robert Gerstenberger, Torsten Hoefler
2012 Lecture Notes in Computer Science  
Our microapplications show that up to 90% of the total communication time can be spent with local serialization and we found significant performance discrepancies between state-of-the-art MPI implementations  ...  MPI datatypes are a way to avoid such intermediate copies and optimize communications, however, it is often unclear which implementation and optimization choices are most useful in practice.  ...  Acknowledgments This work was supported by the DOE Office of Science, Advanced Scientific Computing Research, under award number DE-FC02-10ER26011, program manager Sonia Sachs.  ... 
doi:10.1007/978-3-642-33518-1_17 fatcat:g2wvcpv2wnc5xkj7fyb4xl4qmy

Performance of MPI sends of non-contiguous data [article]

Victor Eijkhout
2018 arXiv   pre-print
We present an experimental investigation of the performance of MPI derived datatypes.  ...  However, for large messages the internal buffering of MPI causes differences in efficiency. The optimal scheme is a combination of packing and derived types.  ...  The native Cray MPI also has similar performance, with the exception that one-sided performance for large sizes is on par with the derived types, unlike on Stampede2 where for all sizes it shows a relative  ... 
arXiv:1809.10778v1 fatcat:3omjlcqy65bd3fxey5qlevhaci

MPI datatype processing using runtime compilation

Timo Schneider, Fredrik Kjolstad, Torsten Hoefler
2013 Proceedings of the 20th European MPI Users' Group Meeting on - EuroMPI '13  
Data packing before and after communication can make up as much as 90% of the communication time on modern computers.  ...  We show several examples of how MPI datatype pack functions benefit from runtime compilation and analyze the performance of compiled pack functions for the data access patterns in many applications.  ...  We also observed that in many cases the manual pack loops are faster than using MPI derived datatypes.  ... 
doi:10.1145/2488551.2488552 dblp:conf/pvm/SchneiderKH13 fatcat:p6va3h367ncdnf24oustp3aw3u

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P [chapter]

Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, Ewing Lusk
2008 Lecture Notes in Computer Science  
In this paper, we study different non-data-communication overheads within the MPI implementation on the IBM Blue Gene/P system.  ...  This means that the local preand post-communication processing required by the MPI stack might not be very fast, owing to the slow processing cores.  ...  Overheads in Derived Datatype Processing MPI allows non-contiguous messages to be sent and received using derived datatypes to describe the message.  ... 
doi:10.1007/978-3-540-87475-1_9 fatcat:x3qidsohcfd67alg76fou2r4ty

MT-MPI

Min Si, Antonio J. Peña, Pavan Balaji, Masamichi Takagi, Yutaka Ishikawa
2014 Proceedings of the 28th ACM international conference on Supercomputing - ICS '14  
We demonstrate the benefit of such internal parallelism for various aspects of MPI processing, including derived datatype communication, shared-memory communication, and network I/O operations.  ...  In this paper, we present MT-MPI, an internally multithreaded MPI implementation that transparently coordinates with the threading runtime system to share idle threads with the application.  ...  Acknowledgments This work was financially supported by (1) the CREST project of the Japan Science and Technology Agency (JST) and the National Project of MEXT called Feasibility Study on Advanced and Efficient  ... 
doi:10.1145/2597652.2597658 dblp:conf/ics/SiPBTI14 fatcat:z7wlisv4ybgytjil6irwc6hm6q

Application-oriented ping-pong benchmarking: how to assess the real communication overheads

Timo Schneider, Robert Gerstenberger, Torsten Hoefler
2013 Computing  
The message passing interface (MPI) standard defines derived datatypes to allow zero-copy formulations of non-contiguous data access patterns.  ...  Moving data between processes has often been discussed as one of the major bottlenecks in parallel computing-there is a large body of research, striving to improve communication latency and bandwidth on  ...  However, there are also many cases where MPI DDTs perform worse than manual packing. a Manual packing with Fortran 90. b Packing with MPI DDTs Fig. 2 An example use case for MPI derived datatypes MPI  ... 
doi:10.1007/s00607-013-0330-4 fatcat:bcqelzmllbdwbn2fxywsnrppyi

Self-Consistent MPI Performance Guidelines

Jesper Larsson Traff, William D. Gropp, Rajeev Thakur
2010 IEEE Transactions on Parallel and Distributed Systems  
For performance portability reasons, users also naturally desire communication optimizations performed on one parallel platform with one MPI implementation to be preserved when switching to another MPI  ...  For reasons of (universal) implementability, the MPI standard does not state any specific performance guarantees, but users expect MPI implementations to deliver good and consistent performance in the  ...  At various stages the paper has benefitted significantly from the comments of anonymous reviewers, whose insightfulness, time and effort we also hereby gratefully acknowledge  ... 
doi:10.1109/tpds.2009.120 fatcat:iyjb4hf3jfh2tmtekc6uak3xii

The Importance of Non-Data-Communication Overheads in MPI

P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk
2010 The international journal of high performance computing applications  
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single  ...  A DMA engine on each compute node offloads most of the network packet injecting and receiving work, so this enables better overlap with of computation and communication.  ...  Overheads in Derived Datatype Processing MPI allows non-contiguous messages to be sent and received using derived datatypes to describe the message.  ... 
doi:10.1177/1094342009359258 fatcat:sqckcl5zarfybjtes2y65gsabe

The Importance of Non-Data-Communication Overheads in MPI

Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, Ewing Lusk
2010 The international journal of high performance computing applications  
With processor speeds no longer doubling every 18-24 months owing to the exponential increase in power consumption and heat dissipation, modern HEC systems tend to rely lesser on the performance of single  ...  A DMA engine on each compute node offloads most of the network packet injecting and receiving work, so this enables better overlap with of computation and communication.  ...  Overheads in Derived Datatype Processing MPI allows non-contiguous messages to be sent and received using derived datatypes to describe the message.  ... 
doi:10.1177/1094342009359528 fatcat:iajma7tgybaojoonq3koztpqyu

Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data

John Jenkins, James Dinan, Pavan Balaji, Tom Peterka, Nagiza F. Samatova, Rajeev Thakur
2014 IEEE Transactions on Parallel and Distributed Systems  
methodology through the MPI datatypes specification.  ...  We also demonstrate the efficacy of kernel-based packing in various communication scenarios, showing multifold improvement in point-to-point communication and evaluating packing within the context of the  ...  ACKNOWLEDGMENTS This work was supported in part by the U.S. Department of Energy under contract DE-AC02-06CH11357, and additionally by the National Science Foundation under Grant No. 0958311.  ... 
doi:10.1109/tpds.2013.234 fatcat:siy4h37gmzahfnp6gdaeifzlpu

Communication-Aware Hardware-Assisted MPI Overlap Engine [chapter]

Mohammadreza Bayatpour, Jahanzeb Hashmi Maqbool, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda
2020 Lecture Notes in Computer Science  
We evaluate the proposed designs against state-of-the-art MPI libraries and show up to 41% and 22% reduction in latency for collective operations and stencil-based application kernels on 1024 and 128 nodes  ...  The proposed design adapts to the application's communication requirements including message size, datatype, and relative timing of processes using heuristics and history-driven predictions.  ...  As we mentioned before, the benchmark creates an MPI derived datatype and during the window size number of transfers, it runs few iterations with derived datatypes by using the same tag as of other transfers  ... 
doi:10.1007/978-3-030-50743-5_26 fatcat:tsudc4aw7jhvbd77tsh42zxu54

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, Rajeev Thakur
2012 2012 IEEE International Conference on Cluster Computing  
To address this gap, we present the design and implementation of efficient MPI datatypes processing system, which is capable of efficiently processing arbitrary datatypes directly on the GPU.  ...  Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU-acceleration of largescale scientific and engineering computations.  ...  Acknowledgements This work was supported in part by the U.S. Department of Energy under contract DE-AC02-06CH11357, and additionally by the National Science Foundation under Grant No. 0958311.  ... 
doi:10.1109/cluster.2012.72 dblp:conf/cluster/JenkinsDBST12 fatcat:nao2aiqflfgzrp7byepxhnrwri

Combining I/O operations for multiple array variables in parallel netCDF

Kui Gao, Wei-keng Liao, Alok Choudhary, Robert Ross, Robert Latham
2009 2009 IEEE International Conference on Cluster Computing and Workshops  
Moreover, the record variables data is stored interleaved by record, and the contiguity information is lost, so the existing MPI-IO collective I/O optimization can not help.  ...  much higher performance with larger request sizes.  ...  MPI-IO inherits two important MPI features: MPI communicators defining a set of processes for group operations, and MPI derived datatypes describing complex memory layouts.  ... 
doi:10.1109/clustr.2009.5289153 dblp:conf/cluster/GaoLCRL09 fatcat:nawlq6oyrbc7bh4d3nkydzyepq
« Previous Showing results 1 — 15 out of 494 results