Filters








1,720 Hits in 5.0 sec

MPI ON MILLIONS OF CORES

PAVAN BALAJI, DARIUS BUNTINAS, DAVID GOODELL, WILLIAM GROPP, TORSTEN HOEFLER, SAMEER KUMAR, EWING LUSK, RAJEEV THAKUR, JESPER LARSSON TRÄFF
2011 Parallel Processing Letters  
In this paper, we examine the issue of scalability of MPI to very large systems.  ...  We also briefly discuss issues in application scalability to large process counts and features of MPI that enable the use of other techniques to alleviate scalability limitations in applications.  ...  Acknowledgments We thank the members of the MPI Forum who participated in helpful discussions of the presented topics. We also thank the anonymous reviewers for comments that improved the manuscript.  ... 
doi:10.1142/s0129626411000060 fatcat:pbevuvwmvba6bce3ppogkixbhq

Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions [chapter]

Torsten Hoefler, Marc Snir
2011 Lecture Notes in Computer Science  
We derive common requirements that parallel libraries pose on the programming framework. We then show how those requirements are supported in the Message Passing Interface (MPI) standard.  ...  Finally, we conclude with a discussion of state-of-the art of parallel library programming and we provide some guidelines for library designers.  ...  Acknowledgments This work was supported by the Blue Waters sustainedpetascale computing project, which is supported by the National Science Foundation (award number OCI 07-25070) and the state of Illinois  ... 
doi:10.1007/978-3-642-24449-0_45 fatcat:upeovg4r2vgkxhtdnbrrkwc6o4

Multi-core and Network Aware MPI Topology Functions [chapter]

Mohammad Javad Rashti, Jonathan Green, Pavan Balaji, Ahmad Afsahi, William Gropp
2011 Lecture Notes in Computer Science  
MPI standard offers a set of topology-aware interfaces that can be used to construct graph and Cartesian topologies for MPI applications.  ...  To optimize the performance, in this paper we use graph embedding and node/network architecture discovery modules to match the communication topology of the applications to the physical topology of multi-core  ...  Department of Energy and National Science Foundation. We thank Mellanox Technologies and HPC Advisory Council for the resources.  ... 
doi:10.1007/978-3-642-24449-0_8 fatcat:tr7ebshrbfd2rjv5sfyqwbhlc4

Enabling highly-scalable remote memory access programming with MPI-3 one sided

Robert Gerstenberger, Maciej Besta, Torsten Hoefler
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice.  ...  In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification.  ...  , Nick Wright for the UPC version of MILC, and Paul Hargrove for the UPC version of NAS-FT.  ... 
doi:10.1145/2503210.2503286 dblp:conf/sc/GerstenbergerBH13 fatcat:xgxtd45spfdt3ogfwgoe6wo4u4

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

Robert Gerstenberger, Maciej Besta, Torsten Hoefler
2014 Scientific Programming  
The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice.  ...  In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification.  ...  , Nick Wright for the UPC version of MILC, and Paul Hargrove for the UPC version of NAS-FT.  ... 
doi:10.1155/2014/571902 fatcat:tokihkphivel7m4rheookvir4i

A Scalable, Linear-Time Dynamic Cutoff Algorithm for Molecular Dynamics [chapter]

Paul Springer, Ahmed E. Ismail, Paolo Bientinesi
2015 Lecture Notes in Computer Science  
Computationally, the challenge is shifted from the long-range solvers to the detection of the interfaces and to the computation of the particle-interface distances.  ...  The idea consists in adopting a cutoff-based method in which the cutoff is cho- sen on a particle-by-particle basis, according to the distance from the interface.  ...  The authors gratefully acknowledge financial support from the Deutsche Forschungsgemeinschaft (German Research Association) through grant GSC 111, computing resources on the supercomputer JUQUEEN at Jülich  ... 
doi:10.1007/978-3-319-20119-1_12 fatcat:7szmqm52ijah7fdwcmv6brgijy

Simulation-Based Performance Prediction of HPC Applications: A Case Study of HPL [article]

Gen Xu, Huda Ibeid, Xin Jiang, Vjekoslav Svilan, Zhaojuan Bian
2020 arXiv   pre-print
a functional level, where the simulator allows the use of the components native interface; this results in a (3) fast and accurate simulation of full HPC applications with minimal modifications to the  ...  We demonstrate the capability and scalability of our approach with High Performance LINPACK (HPL), the benchmark used to rank supercomputers in the TOP500 list.  ...  We are working on automating this process in CoFluent Virtual Thread by enabling the simulation of Linux Pthreads and C++ threads.  ... 
arXiv:2011.02617v1 fatcat:ncrmhv4aqjcuvmnawxqaj3da7i

Parallel Sorting with Minimal Data [chapter]

Christian Siebert, Felix Wolf
2011 Lecture Notes in Computer Science  
Scalable solutions for this case are needed for the communicator constructor MPI Comm split.  ...  This paper presents three parallel sorting algorithms suited for the extreme case where every process contributes only a single element.  ...  We created a virtual ring of processes by using MPI Cart create to embed a one-dimensional and periodic Cartesian topology into the underlying network topology.  ... 
doi:10.1007/978-3-642-24449-0_20 fatcat:h4xa4iiqdbcitntx43duycqrjm

Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming [chapter]

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian W. Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur
2012 Lecture Notes in Computer Science  
We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil  ...  We discuss the integration of this interface with the upcoming MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency  ...  We ran the benchmark on a six-core 2.2 GHz AMD Opteron CPU with two MPI processes and recorded communication and computation times separately.  ... 
doi:10.1007/978-3-642-33518-1_18 fatcat:dxx62c37pzejvmv636ofjaq5fa

Fast and Scalable Startup of MPI Programs in InfiniBand Clusters [chapter]

Weikuan Yu, Jiesheng Wu, Dhabaleswar K. Panda
2004 Lecture Notes in Computer Science  
In this paper, we characterize the startup of MPI programs in InfiniBand clusters and identify two startup scalability issues: serialized process initiation in the initiation phase and high communication  ...  This section provides an overview of MPI program startup in InfiniBand clusters and motivate the study for a scalable startup scheme.  ...  We have also developed an analytical model to project the scalability of the startup schemes.  ... 
doi:10.1007/978-3-540-30474-6_47 fatcat:lo3gch2tbbeihmh2ami2bywj6i

On the Overhead of Topology Discovery for Locality-Aware Scheduling in HPC

Brice Goglin
2017 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)  
This overhead also increases more than linearly with the number of processes that perform it simultaneously.  ...  We then study the actual needs of the HPC software ecosystem in terms of topology information.  ...  ACKNOWLEDGMENTS Some experiments presented in this paper were carried out using the PLAFRIM experimental testbed, being developed under the Inria PlaFRIM development action with support from Bordeaux INP  ... 
doi:10.1109/pdp.2017.35 dblp:conf/pdp/Goglin17 fatcat:qfexpp2g7vcjfp2qumoutl7uaq

An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing [chapter]

Torsten Hoefler, Emmanuel Jeannot, Guillaume Mercier
2014 High-Performance Computing on Complex Environments  
This chapter surveys several techniques and algorithms that efficiently address this issue: the mapping of the application's virtual topology (for instance its communication pattern) onto the physical  ...  topology.  ...  Acknowledgments This work is supported by the COST Action IC0805 "Open European Network for High Performance Computing on Complex Environments". Bibliography  ... 
doi:10.1002/9781118711897.ch5 fatcat:7ok2xcxah5ctvjkuwvmaerbwgu

Improving MPI Applications Performance on Multicore Clusters with Rank Reordering [chapter]

Guillaume Mercier, Emmanuel Jeannot
2011 Lecture Notes in Computer Science  
and the hardware topology.  ...  The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator.  ...  Several MPI functions can reorder processes ranks. It is the case of MPI Dist graph create, part of the standard since MPI 2.2 [6] .  ... 
doi:10.1007/978-3-642-24449-0_7 fatcat:sfbtayyzfbbkho563zqn3a3kfy

Generic topology mapping strategies for large-scale parallel architectures

Torsten Hoefler, Marc Snir
2011 Proceedings of the international conference on Supercomputing - ICS '11  
MPI-2.2 defines interface for re-mapping • Scalable process topology graph • Permutes ranks in communicator • Returns "better" permutation π to the user • User can re-distribute data and use π  ...  consider heterogeneous networks [PERCS'10] Terms and Conventions • Application communication pattern is modeled as weighted graph • is the set of processes • represents the communication volume  ... 
doi:10.1145/1995896.1995909 dblp:conf/ics/HoeflerS11 fatcat:4qr2yep6l5abbc25i73j44yvrq

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur
2013 Computing  
We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil  ...  We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency.  ...  Acknowledgments We thank the members of the MPI Forum and the MPI community for their efforts in creating the MPI 3.0 specification.  ... 
doi:10.1007/s00607-013-0324-2 fatcat:3anov6fhszavhczgohts33rabq
« Previous Showing results 1 — 15 out of 1,720 results