Filters








11 Hits in 4.9 sec

MPI+Threads: runtime contention and remedies

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka
2015 SIGPLAN notices  
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes.  ...  We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems.  ...  For small messages, thread synchronization and runtime contention hide this benefit.  ... 
doi:10.1145/2858788.2688522 fatcat:q2gw434oi5bv3i4e3lwowittsu

MPI+Threads: runtime contention and remedies

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka
2015 Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP 2015  
Hybrid MPI+Threads programming has emerged as an alternative model to the "MPI everywhere" model to better handle the increasing core density in cluster nodes.  ...  We first analyze the MPI runtime when multithreaded concurrent communication takes place on hierarchical memory systems.  ...  For small messages, thread synchronization and runtime contention hide this benefit.  ... 
doi:10.1145/2688500.2688522 dblp:conf/ppopp/AmerLWBM15 fatcat:rnvdxwcx5vawdg7wdzrhv7zo4m

PPL

Alex Brooks, Hoang-Vu Dang, Nikoli Dryden, Marc Snir
2015 Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware - ESPM '15  
We propose a new runtime system design, PPL, which abstracts important high-level concepts of a typical parallel system for distributed-memory machines.  ...  However, they do not scale well as per-node thread counts rise and there is limited interoperability between threading and communication, leading to unnecessary software overheads and an increased amount  ...  Acknowledgments The research presented in this paper was funded through the NSF CISE CCF grant 1337217 and the Laboratory Directed Research and Development (LDRD) program at Sandia National Laboratories  ... 
doi:10.1145/2832241.2832246 dblp:conf/sc/BrooksDDS15 fatcat:7pxuse7tgzgylll4vpfryxyl5m

Towards millions of communicating threads

Hoang-Vu Dang, Marc Snir, William Gropp
2016 Proceedings of the 23rd European MPI Users' Group Meeting on - EuroMPI 2016  
to higher contention or more complex protocols.  ...  The main ingredients of our e cient message-passing runtime are • A light-weight thread scheduler using a bit-vector that requires a single write for marking a thread as runnable.  ...  A more in-depth analysis of locking contention in MPI+Thread can be found in [4] .  ... 
doi:10.1145/2966884.2966914 dblp:conf/pvm/DangSG16 fatcat:dygqi4rssnfw5bv4sdtwgtggxu

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application

Guillaume Latu, Julien Bigot, Nicolas Bouzat, Judit Gimenez, Virginie Grandgirard
2016 Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '16  
This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel  ...  Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.  ...  Let us mention that the thread level for MPI is set to MPI THREAD FUNNELED in gysela.  ... 
doi:10.1145/2929908.2929912 fatcat:4ykib3ax6zb2jkzdog476p3itq

The International Exascale Software Project roadmap

Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, Franck Cappello, Barbara Chapman (+53 others)
2011 The international journal of high performance computing applications  
and on-line.  ...  Encourage and facilitate collaboration in education and training: The magnitude of the changes in programming models and software infrastructure and tools brought about by the transition to peta/exascale  ...  Likewise, the areas that have less software maturity (e.g., health and energy) have more Xs in the programming, languages, and debugging columns. Science and  ... 
doi:10.1177/1094342010391989 fatcat:twdszcjfxraijpsdcdacvpp6vm

SWIFT: Maintaining weak-scalability with a dynamic range of 10^4 in time-step size to harness extreme adaptivity [article]

Josh Borrow, Richard G. Bower, Peter W. Draper, Pedro Gonnet, Matthieu Schaller
2018 arXiv   pre-print
Second, task-based parallelism is used to ensure efficient load-balancing within a single node, using pthreads and SIMD vectorisation.  ...  THE COSMOLOGICAL SIMULATION CODE SWIFT SWIFT [17] is a hybrid MPI & threads C99 code that implements several SPH and particle-based hydrodynamical schemes, a Fast-Multipole-Method (FMM) N-body gravity  ...  However, given the depth of the time-step hierarchy shown here, this would increase the runtime of a given problem by many orders of magnitude.  ... 
arXiv:1807.01341v1 fatcat:wlvsxygoknhghjw4t4kqqtsoqy

Parallel Breadth-First Search on Distributed Memory Systems [article]

Aydin Buluc, Kamesh Madduri
2011 arXiv   pre-print
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems.  ...  Our experimental study identifies execution regimes in which these approaches will be competitive, and we demonstrate extremely high performance on leading distributed-memory parallel systems.  ...  John Shalf and Nick Wright provided generous technical and moral support during the project.  ... 
arXiv:1104.4518v2 fatcat:a7nvtwil35dbtohgpnsfldeeki

Towards exascale computing for cosmological simulations

James Douglas Potter
2017
advice, and for resurrecting Tödi.  ...  The work we have done together has been exciting, challenging and very satisfying, and hopefully other have found it useful.  ...  To avoid this, all communication is channelled through a single MPI thread. Cache requests and flushes are bundled on each local thread before being sent the the MPI thread.  ... 
doi:10.5167/uzh-204887 fatcat:xg5dng5pw5enpirysejlqexq2i

Meshless techniques for anisotropic diffusion

Annamaria Mazzia, Giorgio Pini, Flavio Sartoretto
2014 Applied Mathematics and Computation  
Transport processes are common in geoscience applications, and find their way into models of, e.g., the atmosphere, oceans, shallow water, subsurface, seismic inversion, and deep earth.  ...  A good numerical method would be locally mass conservative, produce no or minimal over/under-shoots, produce minimal numerical diffusion, and require no CFL time-step limit for stability.  ...  We show how to solve the equations using a global implicit approach in an efficient way, and we present the derived computational results.  ... 
doi:10.1016/j.amc.2014.03.032 fatcat:c527226gyfgbffnq4p67qxd7wi

Simulation of shear-driven flows:transition with a free surface and confined turbulence

Roland Bouffanais
2007
A comprehensive analysis of the continuous and discretized formulations of the general problem in the ALE frame, with nonlinear, non-homogeneous and unsteady boundary conditions is presented.  ...  Thus corner eddies, secondary flows, longitudinal vortices, complex threedimensional patterns, chaotic particle motions, nonuniqueness, transition, and turbulence all occur naturally and can be studied  ...  Acknowledgements I would first and foremost like to thank Prof. Michel Deville who has supervised this work. His guidance and support have been invaluable and immensely enriching.  ... 
doi:10.5075/epfl-thesis-3837 fatcat:lwcbzcj3wvbydct33t5jsifbam