A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Containment Domains: A Scalable, Efficient and Flexible Resilience Scheme for Exascale Systems
2013
Scientific Programming
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. ...
Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration ...
We present containment domains (CDs), a new approach for achieving low-overhead resilient and scalable execution. ...
doi:10.1155/2013/473915
fatcat:ewo4ytc2yfeptpoosfy7xzy7p4
Containment domains: A scalable, efficient, and flexible resilience scheme for exascale systems
2012
2012 International Conference for High Performance Computing, Networking, Storage and Analysis
This paper describes and evaluates a scalable and efficient resilience scheme based on the concept of containment domains. ...
Containment domains are a programming construct that enable applications to express resilience needs and to interact with the system to tune and specialize error detection, state preservation and restoration ...
We present containment domains (CDs), a new approach for achieving low-overhead resilient and scalable execution. ...
doi:10.1109/sc.2012.36
dblp:conf/sc/ChungLSRKYKE12
fatcat:es6ssjh4znhbjhscd4on6ahzmq
Special Issue: Selected Papers from Super Computing 2012
2013
Scientific Programming
"Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems" by Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoony, Larry Kaplanz ...
In this paper, the authors describe and evaluate the concept of containment domains, which is a programming construct that allows applications to express resiliency requirements to the underlying programming ...
doi:10.1155/2013/370826
fatcat:vh2goojj6fcwrcius7w4t6q6za
Resilience in Exascale Computing (Dagstuhl Seminar 14402)
2015
Dagstuhl Reports
From September 28 to October 1, 2014, the Dagstuhl Seminar 14402 "Resilience in Exascale Computing" was held in Schloss Dagstuhl -Leibniz Center for Informatics. ...
During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. ...
resilience are promising approaches to efficient resilience in exascale systems. ...
doi:10.4230/dagrep.4.9.124
dblp:journals/dagstuhl-reports/HartigMMR14
fatcat:wwcmeeizubdxvahq4sw54vgtwy
The Landscape of Exascale Research
2020
ACM Computing Surveys
Overall, we observe that great advancements have been made in tackling the two primary exascale challenges: energy efficiency and fault tolerance. ...
We use a three-stage approach in which we (1) discuss various exascale landmark studies, (2) use data-driven techniques to analyze the large collection of related literature, and (3) discuss eight research ...
in the time domain (in addition to the spatial domain). • Data-Centric Approaches: Flexible and efficient software couplers, in-situ data processing, and declarative processing frameworks for data analytics ...
doi:10.1145/3372390
fatcat:jhtwt7pxd5c5darhz75hiqgsnq
Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach
2017
2017 Euromicro Conference on Digital System Design (DSD)
Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. ...
This paper presents the ExaNoDe H2020 research project aiming to design a highly energy efficient and highly integrated heterogeneous compute node targeting Exascale level computing, mixing low-power processors ...
The work presented in this paper reflects only authors' view and the European Commission is not responsible for any use that may be made of the information it contains. ...
doi:10.1109/dsd.2017.37
dblp:conf/dsd/RigoPPRDMDBMMBL17
fatcat:jumekx7n6vcmvnu32epz3tp6py
Exascale Machines Require New Programming Paradigms and Runtimes
2015
Supercomputing Frontiers and Innovations
We propose and discuss important features of programming paradigms and runtimes to deal with exascale computing systems with a special focus on data-intensive applications and resilience. ...
for petascale/exascale systems and (b) catalyzing, coordinating and sustaining the effort of the international open source software community to create that environment as quickly as possible". ...
They conclude that a multi-version model outperforms a single checkpointing scheme in all cases, while for exascale scenarios, the multi-version model increases efficiency significantly. ...
doi:10.14529/jsfi150201
fatcat:ozj4czefxrd37j7djcxuukyuee
Toward Exascale Resilience: 2014 update
2014
Supercomputing Frontiers and Innovations
Resilience is a major roadblock for HPC executions on future exascale systems. These systems will typically gather millions of CPU cores running up to a billion threads. ...
The past five years have seen extraordinary technical progress in many domains related to exascale resilience. ...
Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public ...
doi:10.14529/jsfi140101
fatcat:c5vm7qckzbcetbgpwaodtrdzw4
The International Exascale Software Project roadmap
2011
The international journal of high performance computing applications
Make a thorough assessment of needs, issues and strategies: A successful plan in this arena requires a thorough assessment of the technology drivers for future peta/exascale systems and of the short-term ...
integration of technologies necessary to make them work together smoothly and efficiently, both within individual PetaScale systems and between different systems. ...
application domain, i.e., where we expect the major challenges will be for that domain. ...
doi:10.1177/1094342010391989
fatcat:twdszcjfxraijpsdcdacvpp6vm
A dimension-oblivious domain decomposition method based on space-filling curves
[article]
2022
arXiv
pre-print
This is the core property required to attain a sparse grid based combination method with extreme scalability which can utilize exascale parallel systems efficiently. ...
Moreover, this approach provides a basis for the development of a fault-tolerant solver for the numerical treatment of high-dimensional problems. ...
Our future goal is the development of fault-tolerant parallel solvers on exascale systems which require complete data redundancy to allow for data recovery when faults occur. ...
arXiv:2110.11211v2
fatcat:272s4h2huvdeja4brggtdazdwi
A Scalable and Extensible Checkpointing Scheme for Massively Parallel Simulations
[article]
2018
arXiv
pre-print
In this article, we present a scalable, distributed, diskless, and resilient checkpointing scheme that can create and recover snapshots of a partitioned simulation domain. ...
For future exascale systems it is therefore considered critical that strategies are developed to make software resilient against failures. ...
(www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer JUQUEEN at Jülich Supercomputing Centre (JSC) and SuperMUC at Leibniz Supercomputing Centre (www.lrz.de ...
arXiv:1708.08286v2
fatcat:bkir7z4p5nfcjkkjbz3onfw6fe
IMPROVING PERFORMANCE IN HPC SYSTEM UNDER POWER CONSUMPTIONS LIMITATIONS
2019
International Journal of Advanced Research in Computer Science
Leading to objectives, the current study presents a comprehensive analysis of existing strategies that can be considered to enhance performance and reducing power for emerging Exascale computing system ...
Today's High-Performance Computing (HPC) systems require significant usage of "supercomputers" and extensiveparallel processing approaches for solving complicated computational tasks at the Petascale level ...
Resiliency There is a need for considerably new computing strategies for having the roadmap towards Exascale computing environment. ...
doi:10.26483/ijarcs.v10i2.6397
fatcat:k3l3lk5kuzhnldn5b2qzkh4eia
Ecoscale: Reconfigurable Computing And Runtime System For Future Exascale Systems
2016
Zenodo
ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. ...
To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped into the virtual address space utilizing a dual stage System Memory Management ...
This research project is supported by the European Commission under the H2020 Programme and the ECOSCALE project (grant agreement 671632). ...
doi:10.5281/zenodo.34893
fatcat:ocwfndo4vjei3hqucmndj22xu4
EXA2PRO programming environment
2018
Proceedings of the 18th International Conference on Embedded Computer Systems Architectures, Modeling, and Simulation - SAMOS '18
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. ...
The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. ...
ACKNOWLEDGEMENT This work has received funding from the European Union's Horizon 2020 research and innovation programme EXA2PRO under grant agreement No 801015 (www.exa2pro.eu). ...
doi:10.1145/3229631.3239369
dblp:conf/samos/SoudrisPKKPSCAT18
fatcat:sbqs5qyhgvgtdghb5sf2no6y3q
Exploring versioned distributed arrays for resilience in scientific applications
2016
The international journal of high performance computing applications
Exascale studies project reliability challenges for future HPC systems. We present the Global View Resilience (GVR) system, a library for portable resilience. ...
GVR interfaces are flexible, supporting a variety of recovery schemes, and altogether GVR embodies a gentleslope path to tolerate growing error rates in future extreme-scale systems. ...
accumulate operations
end for
Table 3 . 3 System parameters for flexible recovery efficiency study. ...
doi:10.1177/1094342016664796
fatcat:aaipn5vawrg4dhzka4rigj325y
« Previous
Showing results 1 — 15 out of 239 results