Filters








319 Hits in 6.4 sec

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P [chapter]

Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, Ewing Lusk
2008 Lecture Notes in Computer Science  
In this paper, we study different non-data-communication overheads within the MPI implementation on the IBM Blue Gene/P system.  ...  Modern HEC systems, such as Blue Gene/P, rely on achieving high-performance by using the parallelism of a massive number of low-frequency/low-power processing cores.  ...  Experiments and Analysis Here, we study the non-data-communication overheads in MPI on BG/P.  ... 
doi:10.1007/978-3-540-87475-1_9 fatcat:x3qidsohcfd67alg76fou2r4ty

The Importance of Non-Data-Communication Overheads in MPI

P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk
2010 The international journal of high performance computing applications  
Thus, in this paper, we study the different non-data-communication overheads within the MPI implementation on the IBM Blue Gene/P system.  ...  Our experiments, that scale up to 131,072 cores of the largest Blue Gene/P system in the world (80% of the total system size), reveal several insights into overheads in the MPI stack, which were previously  ...  Concluding Remarks In this paper, we studied the non-data-communication overheads within MPI implementations and demonstrated their impact on the IBM Blue Gene/P system.  ... 
doi:10.1177/1094342009359258 fatcat:sqckcl5zarfybjtes2y65gsabe

The Importance of Non-Data-Communication Overheads in MPI

Pavan Balaji, Anthony Chan, William Gropp, Rajeev Thakur, Ewing Lusk
2010 The international journal of high performance computing applications  
Thus, in this paper, we study the different non-data-communication overheads within the MPI implementation on the IBM Blue Gene/P system.  ...  Our experiments, that scale up to 131,072 cores of the largest Blue Gene/P system in the world (80% of the total system size), reveal several insights into overheads in the MPI stack, which were previously  ...  Concluding Remarks In this paper, we studied the non-data-communication overheads within MPI implementations and demonstrated their impact on the IBM Blue Gene/P system.  ... 
doi:10.1177/1094342009359528 fatcat:iajma7tgybaojoonq3koztpqyu

Massively parallel genomic sequence search on the Blue Gene/P architecture

Heshan Lin, Pavan Balaji, Ruth Poole, Carlos Sosa, Xiaosong Ma, Wu-chun Feng
2008 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis  
This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform.  ...  genes in genomes -in only a few hours on BG/P.  ...  We thank IBM Rochester for providing access to Blue Gene/P development systems and acknowledge the support of the the Biomedical Informatics and Computational Biology (BICB) program at University of Minnesota  ... 
doi:10.1109/sc.2008.5222005 dblp:conf/sc/LinBPSMF08 fatcat:pwjowi5w6ba5vnnbupejeejm4a

Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems

Pavan Balaji, Harish Naik, Narayan Desai
2009 2009 15th International Conference on Parallel and Distributed Systems  
Thus, In this paper, we study the network behavior of the IBM BG/P using several application communication kernels, and monitor network congestion behavior based on detailed hardware counters.  ...  Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat networks (a.k.a. scalable networks) which differ from switched fabrics in that they use a 3D torus or similar topology.  ...  Department of Energy under contract DE-AC02-06CH11357 and in part by the Department of Energy award DE-FG02-08ER25835.  ... 
doi:10.1109/icpads.2009.117 dblp:conf/icpads/BalajiND09 fatcat:fwwndoxytzha7e5xambo4dsypq

Overview of the IBM Blue Gene/P project

2008 IBM Journal of Research and Development  
The Blue Gene/P project has been supported and partially funded by Argonne National Laboratory and the Lawrence Livermore National Laboratory on behalf of the U.S.  ...  B554331. * Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both. ** Trademark, service mark, or registered trademark  ...  Blue Gene/P and Linux The CNK described in previous sections is carefully designed for minimal overhead; thus, performance on the BG/P system is very close to the limits imposed by the hardware.  ... 
doi:10.1147/rd.521.0199 fatcat:rmorpcbwrzbwbcwwifcfmmp3wm

Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer

Sameer Kumar, Philip Heidelberger, Dong Chen, Michael Hines
2010 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)  
We present four different case studies with the multisend API on Blue Gene/P (i) 3D-FFT, (ii) 4D nearest neighbor exchange as used in Quantum Chromodynamics, (iii) NAMD and (iv) neural network simulator  ...  One of the limitations of the current MPI 2.1 standard is that the vector collective calls require counts and displacements (zero and nonzero bytes) to be specified for all the processors in the communicator  ...  the design and development of DCMF on the Blue Gene/P machine.  ... 
doi:10.1109/ipdps.2010.5470407 pmid:21666880 pmcid:PMC3111918 dblp:conf/ipps/KumarHCH10 fatcat:ud3bdkbo6zf4pix4cegwnx6x3m

Parallel I/O Performance for Application-Level Checkpointing on the Blue Gene/P System

Jing Fu, Misun Min, Robert Latham, Christopher D. Carothers
2011 2011 IEEE International Conference on Cluster Computing  
Our study shows that rbIO and coIO result in 100× improvement over previous checkpointing approaches on up to 65,536 processors of the Blue Gene/P using the GPFS.  ...  In this paper, we examine application-level checkpointing for a massively parallel electromagnetic solver system called NekCEM on the IBM Blue Gene/P at Argonne National Laboratory.  ...  leadership-class machines such as the IBM Blue Gene/P.  ... 
doi:10.1109/cluster.2011.81 dblp:conf/cluster/FuMLC11 fatcat:bqzs6wovobgejeo7e5ybtpm2qq

Tracing Data Movements within MPI Collectives

Kevin A. Brown, Jens Domke, Satoshi Matsuoka
2014 Proceedings of the 21st European MPI Users' Group Meeting on - EuroMPI/ASIA '14  
By creating additional trace points in the Peruse utility of Open MPI, we track low-level InfiniBand communication events and then visualize the communication profile in Boxfish for a more comprehensive  ...  The proposed tool-chain is non-intrusive and incurs less than 0.1% runtime overhead with the NPB FT benchmark.  ...  RELATED WORK Landge et. al [6] described how to capture the flow of network packets on IBM Blue Gene/P for tracing application communication over network links.  ... 
doi:10.1145/2642769.2642789 dblp:conf/pvm/BrownDM14 fatcat:7jni3k5ydva7pnapfp74f36yyi

Application performance characterization and analysis on Blue Gene/Q

Bob Walkup
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
Optionally save all data in one file. In most cases, the rank that spent the least time in MPI did the most work. Can use the same strategy based on hardware-counter data.  ...  System date GHz cores/rack largest-system peak-PFlops Blue Gene/L ~2004 0.70 2K 104 racks ~0.6 Blue Gene/P ~2008 0.85 4K 72 racks ~1.0 Blue Gene/Q ~2012 1.60 16K 96 racks ~20.1 #1  ... 
doi:10.1109/sc.companion.2012.358 dblp:conf/sc/Walkup12 fatcat:kpvplgffsvdxvap447nqfpubha

Performance Modeling of Algebraic Multigrid on Blue Gene/Q: Lessons Learned

Hormozd Gahvari, William Gropp, Kirk E. Jordan, Martin Schulz, Ulrike Meier Yang
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
The IBM Blue Gene/Q represents a large step in the evolution of massively parallel machines.  ...  In this paper, we develop a performance model for the solve cycle of algebraic multigrid on Blue Gene/Q to help us understand the issues this popular linear solver for large, sparse linear systems faces  ...  Blue Gene/L and Blue Gene/P featured three-dimensional torus interconnects. This has been changed to a five-dimensional torus in Blue Gene/Q.  ... 
doi:10.1109/sc.companion.2012.57 dblp:conf/sc/GahvariGJSY12 fatcat:k3ir5ijsprdtlhe2odronrkruy

Self-Consistent MPI Performance Guidelines

Jesper Larsson Traff, William D. Gropp, Rajeev Thakur
2010 IEEE Transactions on Parallel and Distributed Systems  
For performance portability reasons, users also naturally desire communication optimizations performed on one parallel platform with one MPI implementation to be preserved when switching to another MPI  ...  We introduce and semi-formalize the concept of self-consistent performance guidelines for MPI, and provide a (non-exhaustive) set of such guidelines in a form that could be automatically verified by benchmarks  ...  Acknowledgments The ideas expounded in this paper were first introduced in [35] , although in a less finished form. We thank a number of colleagues for sometimes intense discussions on these.  ... 
doi:10.1109/tpds.2009.120 fatcat:iyjb4hf3jfh2tmtekc6uak3xii

Unifying UPC and MPI runtimes

Jithin Jose, Miao Luo, Sayantan Sur, Dhabaleswar K. Panda
2010 Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10  
In this paper, we propose "Integrated Native Communication Runtime" (INCR) for MPI and UPC communications on Infini-Band clusters.  ...  Our library is capable of supporting both UPC and MPI communications simultaneously.  ...  Mark Arnold of The Ohio State University for extending help in setting up the experimental systems on a short notice.  ... 
doi:10.1145/2020373.2020378 dblp:conf/pgas/JoseLSP10 fatcat:dqwmduwpgjhlti7dse5mo6dg2u

Formal analysis of MPI-based parallel programs

Ganesh Gopalakrishnan, Robert M. Kirby, Stephen Siegel, Rajeev Thakur, William Gropp, Ewing Lusk, Bronis R. De Supinski, Martin Schulz, Greg Bronevetsky
2011 Communications of the ACM  
is going mainstream in the commodity world, two communities that must look to learn and benefit from one another.  ...  key insights addressing the challenges of distributed systems, debugging necessitates collaboration between HPC and formal verification. along with HPC, distributed computing based on communication libraries  ...  MPI is designed to support highly scalable computing applications using more than 100,000 cores on, say, the IBM Blue Gene/P (see Figure 1 ) and Cray XT5.  ... 
doi:10.1145/2043174.2043194 fatcat:ouhzjx5zsrcvzcmk7kqhaomujy

Peta-scale Lattice Quantum Chromodynamics on a Blue Gene/Q supercomputer

Jun Doi
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
We have optimized lattice QCD programs on Blue Gene family supercomputers and shown the strength in lattice QCD simulation.  ...  Here we optimized on the third generation Blue Gene/Q supercomputer; i) by changing the data layout, ii) by exploiting new SIMD instruction sets, and iii) by pipelining boundary data exchange to overlap  ...  ACKNOWLEDGMENTS We developed and tested our code on Blue Gene/Q systems at IBM Rochester and the IBM Research -T. J. Watson Research Center.  ... 
doi:10.1109/sc.2012.96 dblp:conf/sc/Doi12 fatcat:oa5nqisl45hyxeofzvbikutcka
« Previous Showing results 1 — 15 out of 319 results