A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Resilient X10
2014
SIGPLAN notices
•Existing exception semantics give strong synchronization guarantees Performance is within 90% of non-resilient X10 Kernel found in a number of algorithms, e.g. ...
Failure awareness © 2014 IBM Corporation Resilient X10 Overview 3 Provide helpful semantics: •Failure reporting •Continuing execution on unaffected nodes •Preservation of synchronization: HBI principle ...
MPI
PAMI PAMI
Sockets Sockets
C++ C++
X10 X10
Implementing Resilient X10 (X10RT)
External paxos group of processes -Lightweight resilient store -Still too much overhead (details in paper) ...
doi:10.1145/2692916.2555248
fatcat:up5khhdg2rahdnne3zwcw752qe
Resilient X10
2014
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14
•Existing exception semantics give strong synchronization guarantees Performance is within 90% of non-resilient X10 Kernel found in a number of algorithms, e.g. ...
Failure awareness © 2014 IBM Corporation Resilient X10 Overview 3 Provide helpful semantics: •Failure reporting •Continuing execution on unaffected nodes •Preservation of synchronization: HBI principle ...
MPI
PAMI PAMI
Sockets Sockets
C++ C++
X10 X10
Implementing Resilient X10 (X10RT)
External paxos group of processes -Lightweight resilient store -Still too much overhead (details in paper) ...
doi:10.1145/2555243.2555248
dblp:conf/ppopp/CunninghamGHIKMSTT14
fatcat:zumxyvkjhneztervvth7hjokou
Resilient Optimistic Termination Detection for the Async-Finish Model
[chapter]
2019
Lecture Notes in Computer Science
Driven by increasing core count and decreasing mean-time-to-failure in supercomputers, HPC runtime systems must improve support for dynamic task-parallel execution and resilience to failures. ...
In this paper, we propose optimistic finish, the first message-optimal resilient termination detection protocol for the async-finish model. ...
LULESH X10 provides a resilient implementation of the LULESH shock hydrodynamics proxy application [8] based on rollback-recovery. ...
doi:10.1007/978-3-030-20656-7_15
fatcat:zmlbpfbcufd2nplc7plr7u6bbm
Data-Driven Maintenance Priority and Resilience Evaluation of Performance Loss in a Main Coolant System
2022
Mathematics
Based on the LIM, RIMs for single component failure and multiple component failures were developed to measure the recovery efficiency of the system performance. ...
In this paper, a resilience importance measure (RIM) for performance loss is proposed to evaluate the performance of the MCS. ...
[9] established a resilience assessment model by quantifying the relationship between resilience and resilience components in the recovery from emergency accidents in NPPs. ...
doi:10.3390/math10040563
fatcat:r6toq62xujedxfbkyeg6rponam
A Java Task Pool Framework providing Fault-Tolerant Global Load Balancing
2018
International Journal of Networking and Computing
Our algorithm is shown to be correct in the sense that failures are either tolerated and the computed result is the same as in non-failure case, or the program aborts with an error message. ...
It implements a comparatively simple algorithm that relies on a resilient data structure for storing backups of local pools and other information. ...
Recovery is explained in detail in Section 3.3 for the single-failure case, and in Section 3.4 for the multiple-failure case. ...
doi:10.15803/ijnc.8.1_2
fatcat:u23fwvr2iffkriulb7rzpi42su
Semantics of (Resilient) X10
[chapter]
2014
Lecture Notes in Computer Science
These principles permit an X10 programmer to write clean code that continues to work in the presence of place failure. The given semantics have additionally been mechanized in Coq. ...
This model accurately captures the behavior of a large class of concurrent, multi-place X10 programs. Further, we introduce a formal model of resilience in X10. ...
The failure of a location can be detected, allowing failure recovery. ...
doi:10.1007/978-3-662-44202-9_27
fatcat:xgnhfeklmnauhldjkzmfzvf6hu
Semantics of (Resilient) X10
[article]
2013
arXiv
pre-print
This model accurately captures the behavior of a large class of concurrent, multi-place X10 programs. Further, we introduce a formal model of resilience in X10. ...
These principles permit an X10 programmer to write clean code that continues to work in the presence of place failure. The given semantics have additionally been mechanized in Coq. ...
The failure of a location can be detected, allowing failure recovery. ...
arXiv:1312.3739v1
fatcat:tqyjnwb7gjfihcnknzindg226u
Fault Tolerance for Lifeline-Based Global Load Balancing
2017
Journal of Software Engineering and Applications
Our algorithm is able to recover from multiple fail-stop failures. If recovery is not possible, it halts with an error message. ...
After failures, the backup partner takes over saved copies and collects others. In case of multiple failures, invocations of the restore protocol are nested. ...
X10 supports a mode called Resilient X10, in which the user program is notified in the event of a permanent place failure. ...
doi:10.4236/jsea.2017.1013053
fatcat:s5m4ebb3afafphtkooebsm7xxi
We also provide a preliminary evaluation show the cost of providing fault-tolerance in X10-FT. ...
based on the characteristics of the APGAS model to make checkpoints and consensus, which allows transparently handling machine failures in different granularities. ...
Science and Technology Development Funds (No. 12QA1401700), a Foundation for the Author of National Excellent Doctoral Dissertation of PR China and Fundamental Research Funds for the Central Universities in ...
doi:10.1145/2442992.2442994
dblp:conf/ppopp/XieHC13
fatcat:wsntxv2rgjdupp6p3f7oceewk4
A Survey on Resiliency Techniques in Cloud Computing Infrastructures and Applications
2016
IEEE Communications Surveys and Tutorials
One of the critical challenges is resiliency: disruptions due to failures (either accidental or because of disasters or attacks) may entail significant revenue losses (e.g., US$ 25.5 billion in 2010 for ...
., also including resilience of the middleware infrastructure). The third part focuses on resilience in application design and development. ...
object) written in X10 or in a multi-purpose language. ...
doi:10.1109/comst.2016.2531104
fatcat:vzvkai7nkrbbda63fesn7zw4di
Fault Tolerance Schemes for Global Load Balancing in X10
2015
Scalable Computing : Practice and Experience
X10 and Resilient X10. X10 is a novel parallel language from IBM [3], which supports object orientation and exception handling in a similar way as Java. Following the Asynchronous PGAS (APGAS) ...
One approach handles permanent node failures at user level. It is supported by Resilient X10, a Partitioned Global Address Space language that throws an exception when a place fails. ...
Resilient X10 provides two mechanisms for failure notification. First, a DeadPlaceException (DPE) is raised in the event of a failure. ...
doi:10.12694/scpe.v16i2.1088
fatcat:kmpuxusdr5bznkanl7i6u2ubni
MODC: Resilience for disaggregated memory architectures using task-based programming
[article]
2021
arXiv
pre-print
We present highlights of our MODC prototype and experimental results demonstrating that MODC-style resilience outperforms a checkpoint-based approach in the face of failures. ...
They also provide an independent failure model, where computations or the compute nodes they run on may fail independently of the disaggregated memory; thus, data that's resident in the disaggregated memory ...
Resilient X10 [19] proposes extensions to the X10 task-parallel language [17] to expose failures to programmers, who can then handle individual task failures by exploiting domain-specific knowledge ...
arXiv:2109.05329v1
fatcat:xq6o5reg3nhndmvgooht2m3d74
A taxonomy of task-based parallel programming technologies for high-performance computing
2018
Journal of Supercomputing
In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We ...
However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and ...
distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in ...
doi:10.1007/s11227-018-2238-4
fatcat:fctzmtp3n5fithxfchl5rub7j4
A Taxonomy Of Task-Based Technologies For High-Performance Computing
2017
Zenodo
In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. ...
We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today. The final publication is available at Springer LNCS. ...
In such a scenario, a process cannot detect its failure; however, in a distributed run, another process may detect the failure, and trigger a recovery strategy across all processes. ...
doi:10.5281/zenodo.1162306
fatcat:7d7lu2l6kfc3necv3pdien6xc4
A Taxonomy Of Task-Based Parallel Programming Technologies For High-Performance Computing
2017
Zenodo
We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today. ...
In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. ...
In such a scenario, a process cannot detect its failure; however, in a distributed run, another process may detect the failure, and trigger a recovery strategy across all processes. ...
doi:10.5281/zenodo.1119094
fatcat:kbuhio5hu5bs7kqkuj5s4jijdi
« Previous
Showing results 1 — 15 out of 490 results