32 Hits in 7.6 sec

VELOC: VEry Low Overhead Checkpointing in the Age of Exascale [article]

Bogdan Nicolae and Adam Moody and Gregory Kosinovsky and Kathryn Mohror and Franck Cappello
2021 arXiv   pre-print
This extended abstract presents an overview of VeloC (Very Low Overhead Checkpointing System), a checkpointing runtime specifically design to address these challenges for the next generation Exascale HPC  ...  In this context, state of art is insufficient to deal with the diversity of vendor APIs, performance and persistency characteristics.  ...  To this end, we developed an experimental module that leverages an optimized low-level put/get API for key-value pairs. Figure 1 : 1 Architecture of VeloC (Very Low Overhead Checkpointing System)  ... 
arXiv:2103.02131v1 fatcat:53tvxe2iszde5gwkr4dy6gxeeq

DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models

Bogdan Nicolae, Jiali Li, Justin M. Wozniak, George Bosilca, Matthieu Dorier, Franck Cappello
2020 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)  
In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications.  ...  One common pattern emerging in such applications is frequent checkpointing of the state of the learning model during training, needed in a variety of scenarios: analysis of intermediate states to explain  ...  Fig. 4 : 4 Architecture of our proposal based on the VELOC (Very Low Overhead Checkpoint-Restart) runtime. phase (lower is better). Runtime overhead (lower is better).  ... 
doi:10.1109/ccgrid49817.2020.00-76 dblp:conf/ccgrid/NicolaeLWBDC20 fatcat:s4565nfzczhfzmk4gir3tgkt64

Towards Aggregated Asynchronous Checkpointing [article]

Mikaila J. Gossman, Bogdan Nicolae, Jon C. Calhoun, Franck Cappello, Melissa C. Smith
2021 arXiv   pre-print
Multi-level asynchronous checkpoint runtimes like VELOC (Very Low Overhead Checkpoint Strategy) are gaining popularity among application scientists for their ability to leverage fast node-local storage  ...  This paper discusses the viability and challenges of designing aggregation techniques for asynchronous multi-level checkpointing.  ...  VELOC: VEry Low Overhead Checkpointing in the Age of Exascale. pointing with Parallel File System Delegation.  ... 
arXiv:2112.02289v1 fatcat:z6cdz46x6rf3vhg3ru5xruwtsi

Towards Exascale Lattice Boltzmann computing

S. Succi, G. Amati, M. Bernaschi, G. Falcucci, M. Lauricella, A. Montessori
2019 Computers & Fluids  
We discuss the state of art of Lattice Boltzmann (LB) computing, with special focus on prospective LB schemes capable of meeting the forthcoming Exascale challenge.  ...  After reviewing the basic notions of LB computing, we discuss current techniques to improve the performance of LB codes on parallel machines and illustrate selected leading-edge applications in the Petascale  ...  The CPR strategy is error-free, but very costly, since it requires a periodic dump of the full system configuration, easily in the order of tens of thousands trillions variables for Exascale applications  ... 
doi:10.1016/j.compfluid.2019.01.005 fatcat:7pmhodov7rb3zaf4ehcekuq4ke

Memory leads the way to better computing

H.-S. Philip Wong, Sayeef Salahuddin
2015 Nature Nanotechnology  
The report itself was drawn from the results of a series of meetings over the second half of 2007, and as such reflects a snapshot in time.  ...  Further, the report itself was assembled in just a few months at the beginning of 2008 from input by the participants.  ...  With a checkpoint time of 3 seconds (8.3×10 −4 hours), a checkpointing interval of about 3 minutes is near 184 optimal which gives a checkpoint overhead of about 1.7% (see Section 6.7.4).  ... 
doi:10.1038/nnano.2015.29 pmid:25740127 fatcat:d6iiuuwcozbxlgn4kxxzdzwd4m

A Scalable and Extensible Checkpointing Scheme for Massively Parallel Simulations [article]

Nils Kohl, Johannes Hötzer, Florian Schornbaum, Martin Bauer, Christian Godenschwager, Harald Köstler, Britta Nestler, Ulrich Rüde
2018 arXiv   pre-print
The checkpointing mechanism is fully integrated in a state-of-the-art high-performance multi-physics simulation framework.  ...  Realistic simulations in engineering or in the materials sciences can consume enormous computing resources and thus require the use of massively parallel supercomputers.  ...  Acknowledgements The authors gratefully acknowledge funding by the joint BMBF project Skampy. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V.  ... 
arXiv:1708.08286v2 fatcat:bkir7z4p5nfcjkkjbz3onfw6fe

Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training

Jiali Li, Bogdan Nicolae, Justin Wozniak, George Bosilca
2019 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)  
In the age of big data, deep learning has emerged as a powerful tool to extract insight and exploit its value, both in industry and scientific applications.  ...  To address this issue, in this paper we study the behavior of Horovod, a popular data parallel approach that relies on MPI, on Theta, a pre-Exascale machine at Argonne National Laboratory.  ...  It used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.  ... 
doi:10.1109/mlhpc49564.2019.00006 dblp:conf/sc/LiNWB19 fatcat:pcxwhll7xncrdp2m652gpx323u

Big data and extreme-scale computing

M Asch, T Moore, R Badia, M Beck, P Beckman, T Bidot, F Bodin, F Cappello, A Choudhary, B de Supinski, E Deelman, J Dongarra (+27 others)
2018 The international journal of high performance computing applications  
Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery  ...  Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the  ...  They would also gratefully acknowledge all the following sponsors who supported the big data and exascale computing workshop series: Government Sponsors: US Department of Energy, the National Science Foundation  ... 
doi:10.1177/1094342018778123 fatcat:vwrrxmad4rhtppq6ioaz4h5q7a

waLBerla: A block-structured high-performance framework for multiphysics simulations [article]

Martin Bauer, Sebastian Eibl, Christian Godenschwager, Nils Kohl, Michael Kuron, Christoph Rettinger, Florian Schornbaum, Christoph Schwarzmeier, Dominik Thönnes, Harald Köstler, Ulrich Rüde
2019 arXiv   pre-print
Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system.  ...  The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms  ...  Centre, the Leibniz-Rechenzentrum in Garching, the Regionales Rechenzentrum Erlangen and the Swiss National Supercomputing Centre in Lugano.  ... 
arXiv:1909.13772v1 fatcat:b2iwdbbugjeebk3diazleysedi

The ParaView Coprocessing Library: A scalable, general purpose in situ visualization library

Nathan Fabian, Kenneth Moreland, David Thompson, Andrew C. Bauer, Pat Marion, Berk Gevecik, Michel Rasquin, Kenneth E. Jansen
2011 2011 IEEE Symposium on Large Data Analysis and Visualization  
Finally, we will demonstrate the library's scalability in a number of real-world scenarios.  ...  As high performance computing approaches exascale, CPU capability far outpaces disk write speed, and in situ visualization becomes an essential part of an analyst's workflow.  ...  ACKNOWLEDGEMENTS Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.  ... 
doi:10.1109/ldav.2011.6092322 dblp:conf/ldav/FabianMTBMGRJ11 fatcat:7r2p6hdpn5cmvjttcobxrgwk7m


Michael S. Warren
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite.  ...  Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.  ...  This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Sci  ... 
doi:10.1145/2503210.2503220 dblp:conf/sc/Warren13 fatcat:gnqv7liqxfdxbh6bowoqyoqciq

The Plasma Simulation Code: A modern particle-in-cell code with patch-based load-balancing

Kai Germaschewski, William Fox, Stephen Abbott, Narges Ahmadi, Kristofor Maynard, Liang Wang, Hartmut Ruhl, Amitava Bhattacharjee
2016 Journal of Computational Physics  
We review the basic components of the particle-in-cell method as well as the computational architecture of the psc code that allows support for modular algorithms and data structure in the code.  ...  We then describe and analyze in detail a distinguishing feature of psc: patch-based load balancing using space-filling curves which is shown to lead to major efficiency gains over unbalanced methods and  ...  Second, a particle undergoing Coulomb collisions executes a type of random-walk in velocity space though a very large number of small-angle collisions.  ... 
doi:10.1016/ fatcat:kkoik3qiefgdva5hstcmnq7dgu

Intrinsic fault tolerance of multilevel Monte Carlo methods

Stefan Pauli, Peter Arbenz, Christoph Schwab
2015 Journal of Parallel and Distributed Computing  
Our mathematical model assumes node failures which occur uncorrelated of MC sampling and with general sample failure statistics on the different levels and which also assume absence of checkpointing, i.e  ...  Modifications of the MLMC with enhanced resilience are proposed. The theoretical results are obtained under general statistical models of CPU failure at runtime.  ...  It would be very useful to know the failure distribution of the computer one is using.  ... 
doi:10.1016/j.jpdc.2015.07.005 fatcat:oizertdv3vecbec6rhlyasbjoy

2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

Michael S. Warren
2014 Scientific Programming  
These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite.  ...  Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.  ...  Acknowledgements We gratefully acknowledge John Salmon for his many contributions to the initial version of HOT, and helpful comments on a draft version of this manuscript.  ... 
doi:10.1155/2014/802125 fatcat:aa3qtuifc5etvnfgr66g5o7qmu

PIConGPU: Predictive Simulations of Laser-Particle Accelerators with Manycore Hardware

Axel Huebl, Michael Dr. Bussmann, Thomas Dr. Kluge, Ulrich Prof. Dr. Schramm, Thomas E. Prof. Dr. Cowan, Paul Prof. Dr. Gibbon, Burkhardt Prof. Dr. Kämpfer, Stefan PD Dr. Grafström
2019 Zenodo  
The open source particle-in-cell code PIConGPU, which is developed in the framework of this thesis, answers these demands, providing speed and scalability to run on the world's largest supercomputers.  ...  In this study, 3D modeling with the GPU supercomputer Titan enabled the identification of pre-expansion to near-critical target conditions, which uncovers a regime of volumetric laser-electron interaction  ...  Efficiency is the relative overhead to ideal scaling. . Exascale-Era Simulations with PIConGPU performance in that regard is presented in section . . .  ... 
doi:10.5281/zenodo.3266819 fatcat:b73a3zvxvjecjplcsks7gf52cu
« Previous Showing results 1 — 15 out of 32 results