239 Hits in 4.7 sec

Containment Domains: A Scalable, Efficient and Flexible Resilience Scheme for Exascale Systems

Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan, Mattan Erez
2013 Scientific Programming  
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains.  ...  Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration  ...  We present containment domains (CDs), a new approach for achieving low-overhead resilient and scalable execution.  ... 
doi:10.1155/2013/473915 fatcat:ewo4ytc2yfeptpoosfy7xzy7p4

Containment domains: A scalable, efficient, and flexible resilience scheme for exascale systems

Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan, Mattan Erez
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains.  ...  Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration  ...  We present containment domains (CDs), a new approach for achieving low-overhead resilient and scalable execution.  ... 
doi:10.1109/sc.2012.36 dblp:conf/sc/ChungLSRKYKE12 fatcat:es6ssjh4znhbjhscd4on6ahzmq

Special Issue: Selected Papers from Super Computing 2012

Jeffrey S. Vetter, Padma Raghavan
2013 Scientific Programming  
"Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems" by Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoony, Larry Kaplanz  ...  In this paper, the authors describe and evaluate the concept of containment domains, which is a programming construct that allows applications to express resiliency requirements to the underlying programming  ... 
doi:10.1155/2013/370826 fatcat:vh2goojj6fcwrcius7w4t6q6za

Resilience in Exascale Computing (Dagstuhl Seminar 14402)

Hermann Härtig, Satoshi Matsuoka, Frank Mueller, Alexander Reinefeld, Marc Herbstritt
2015 Dagstuhl Reports  
From September 28 to October 1, 2014, the Dagstuhl Seminar 14402 "Resilience in Exascale Computing" was held in Schloss Dagstuhl -Leibniz Center for Informatics.  ...  During the seminar, several participants presented their current research, and ongoing work and open problems were discussed.  ...  resilience are promising approaches to efficient resilience in exascale systems.  ... 
doi:10.4230/dagrep.4.9.124 dblp:journals/dagstuhl-reports/HartigMMR14 fatcat:wwcmeeizubdxvahq4sw54vgtwy

The Landscape of Exascale Research

Stijn Heldens, Pieter Hijma, Ben Van Werkhoven, Jason Maassen, Adam S. Z. Belloum, Rob V. Van Nieuwpoort
2020 ACM Computing Surveys  
Overall, we observe that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance.  ...  We use a three-stage approach in which we (1) discuss various exascale landmark studies, (2) use data-driven techniques to analyze the large collection of related literature, and (3) discuss eight research  ...  in the time domain (in addition to the spatial domain). • Data-Centric Approaches: Flexible and efficient software couplers, in-situ data processing, and declarative processing frameworks for data analytics  ... 
doi:10.1145/3372390 fatcat:jhtwt7pxd5c5darhz75hiqgsnq

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, Denis Dutoit, Pierre-Yves Martinez, Chris Doran, Luca Benini, Iakovos Mavroidis, Manolis Marazakis, Valeria Bartsch, Guy Lonsdale (+8 others)
2017 2017 Euromicro Conference on Digital System Design (DSD)  
Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution.  ...  This paper presents the ExaNoDe H2020 research project aiming to design a highly energy efficient and highly integrated heterogeneous compute node targeting Exascale level computing, mixing low-power processors  ...  The work presented in this paper reflects only authors' view and the European Commission is not responsible for any use that may be made of the information it contains.  ... 
doi:10.1109/dsd.2017.37 dblp:conf/dsd/RigoPPRDMDBMMBL17 fatcat:jumekx7n6vcmvnu32epz3tp6py

Exascale Machines Require New Programming Paradigms and Runtimes

2015 Supercomputing Frontiers and Innovations  
We propose and discuss important features of programming paradigms and runtimes to deal with exascale computing systems with a special focus on data-intensive applications and resilience.  ...  for petascale/exascale systems and (b) catalyzing, coordinating and sustaining the effort of the international open source software community to create that environment as quickly as possible".  ...  They conclude that a multi-version model outperforms a single checkpointing scheme in all cases, while for exascale scenarios, the multi-version model increases efficiency significantly.  ... 
doi:10.14529/jsfi150201 fatcat:ozj4czefxrd37j7djcxuukyuee

Toward Exascale Resilience: 2014 update

2014 Supercomputing Frontiers and Innovations  
Resilience is a major roadblock for HPC executions on future exascale systems. These systems will typically gather millions of CPU cores running up to a billion threads.  ...  The past five years have seen extraordinary technical progress in many domains related to exascale resilience.  ...  Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public  ... 
doi:10.14529/jsfi140101 fatcat:c5vm7qckzbcetbgpwaodtrdzw4

The International Exascale Software Project roadmap

Jack Dongarra, Pete Beckman, Terry Moore, Patrick Aerts, Giovanni Aloisio, Jean-Claude Andre, David Barkai, Jean-Yves Berthou, Taisuke Boku, Bertrand Braunschweig, Franck Cappello, Barbara Chapman (+53 others)
2011 The international journal of high performance computing applications  
Make a thorough assessment of needs, issues and strategies: A successful plan in this arena requires a thorough assessment of the technology drivers for future peta/exascale systems and of the short-term  ...  integration of technologies necessary to make them work together smoothly and efficiently, both within individual PetaScale systems and between different systems.  ...  application domain, i.e., where we expect the major challenges will be for that domain.  ... 
doi:10.1177/1094342010391989 fatcat:twdszcjfxraijpsdcdacvpp6vm

A dimension-oblivious domain decomposition method based on space-filling curves [article]

Michael Griebel, Marc Alexander Schweitzer, Lukas Troska
2022 arXiv   pre-print
This is the core property required to attain a sparse grid based combination method with extreme scalability which can utilize exascale parallel systems efficiently.  ...  Moreover, this approach provides a basis for the development of a fault-tolerant solver for the numerical treatment of high-dimensional problems.  ...  Our future goal is the development of fault-tolerant parallel solvers on exascale systems which require complete data redundancy to allow for data recovery when faults occur.  ... 
arXiv:2110.11211v2 fatcat:272s4h2huvdeja4brggtdazdwi

A Scalable and Extensible Checkpointing Scheme for Massively Parallel Simulations [article]

Nils Kohl, Johannes Hötzer, Florian Schornbaum, Martin Bauer, Christian Godenschwager, Harald Köstler, Britta Nestler, Ulrich Rüde
2018 arXiv   pre-print
In this article, we present a scalable, distributed, diskless, and resilient checkpointing scheme that can create and recover snapshots of a partitioned simulation domain.  ...  For future exascale systems it is therefore considered critical that strategies are developed to make software resilient against failures.  ...  ( for funding this project by providing computing time on the GCS Supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC) and SuperMUC at Leibniz Supercomputing Centre (  ... 
arXiv:1708.08286v2 fatcat:bkir7z4p5nfcjkkjbz3onfw6fe


Muhammad Usman Ashraf
2019 International Journal of Advanced Research in Computer Science  
Leading to objectives, the current study presents a comprehensive analysis of existing strategies that can be considered to enhance performance and reducing power for emerging Exascale computing system  ...  Today's High-Performance Computing (HPC) systems require significant usage of "supercomputers" and extensiveparallel processing approaches for solving complicated computational tasks at the Petascale level  ...  Resiliency There is a need for considerably new computing strategies for having the roadmap towards Exascale computing environment.  ... 
doi:10.26483/ijarcs.v10i2.6397 fatcat:k3l3lk5kuzhnldn5b2qzkh4eia

Ecoscale: Reconfigurable Computing And Runtime System For Future Exascale Systems

Iakovos Mavroidis, Ioannis Papaefstathiou, Luciano Lavagno, Dimitrios Nikolopoulos, Dirk Koch, John Goodacre, Ioannis Sourdis, Vassilis Papaefstathiou, Marcello Coppola, Manuel Palomino
2016 Zenodo  
ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system.  ...  To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management  ...  This research project is supported by the European Commission under the H2020 Programme and the ECOSCALE project (grant agreement 671632).  ... 
doi:10.5281/zenodo.34893 fatcat:ocwfndo4vjei3hqucmndj22xu4

EXA2PRO programming environment

Dimitrios Soudris, Raymond Namyst, Dirk Pleiter, Georgi Gaydadjiev, Tobias Becker, Matthieu Haefele, Lazaros Papadopoulos, Christoph W. Kessler, Dionysios D. Kehagias, Athanasios Papadopoulos, Panos Seferlis, Alexander Chatzigeorgiou (+2 others)
2018 Proceedings of the 18th International Conference on Embedded Computer Systems Architectures, Modeling, and Simulation - SAMOS '18  
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not.  ...  The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.  ...  ACKNOWLEDGEMENT This work has received funding from the European Union's Horizon 2020 research and innovation programme EXA2PRO under grant agreement No 801015 (  ... 
doi:10.1145/3229631.3239369 dblp:conf/samos/SoudrisPKKPSCAT18 fatcat:sbqs5qyhgvgtdghb5sf2no6y3q

Exploring versioned distributed arrays for resilience in scientific applications

A Chien, P Balaji, N Dun, A Fang, H Fujita, K Iskra, Z Rubenstein, Z Zheng, J Hammond, I Laguna, D Richards, A Dubey (+5 others)
2016 The international journal of high performance computing applications  
Exascale studies project reliability challenges for future HPC systems. We present the Global View Resilience (GVR) system, a library for portable resilience.  ...  GVR interfaces are flexible, supporting a variety of recovery schemes, and altogether GVR embodies a gentleslope path to tolerate growing error rates in future extreme-scale systems.  ...  accumulate operations end for Table 3 . 3 System parameters for flexible recovery efficiency study.  ... 
doi:10.1177/1094342016664796 fatcat:aaipn5vawrg4dhzka4rigj325y
« Previous Showing results 1 — 15 out of 239 results