Filters








52,328 Hits in 5.7 sec

Application-Level Correctness and its Impact on Fault Tolerance

Xuanhua Li, Donald Yeung
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Under application-level correctness, a program's execution is deemed correct as long as the result it produces is acceptable to the user.  ...  Our results show for 6 multimedia and AI benchmarks that 45.8% of architecturally incorrect faults are correct at the application level.  ...  Acknowledgements The authors would like to thank Hameed Badawy, Steve Crago, Vida Kianzad, Wanli Liu, Janice McMahon, Priyanka Rajkhowa, and Meng-Ju Wu for insightful discussions on soft computing.  ... 
doi:10.1109/hpca.2007.346196 dblp:conf/hpca/LiY07 fatcat:wg4bcimbyze3hboifwlosms5x4

Soft-error classification and impact analysis on real-time operating systems

N. Ignat, B. Nicolescu, Y. Savaria, G. Nicolescu
2006 Proceedings of the Design Automation & Test in Europe Conference  
We report results of a detailed analysis regarding the impact of soft-errors on real-time operating systems cores, taking into account the application timing constraints.  ...  Our results show the extent to which softerrors occurring in a real-time operating system's kernel impact its reliability.  ...  As a general characteristic of existing works, they propose fault tolerant solutions only for the application level.  ... 
doi:10.1109/date.2006.244063 dblp:conf/date/IgnatNSN06 fatcat:zrveqmyspjez3ldlq3yew2snse

Resilience for Collaborative Applications on Clouds [chapter]

Toàn Nguyên, Jean-Antoine Désidéri
2012 Lecture Notes in Computer Science  
Because e-Science applications are data intensive and require long execution runs, it is important that they feature fault-tolerance mechanisms.  ...  Cloud and grid computing infrastructures often support system and network fault-tolerance. They repair and prevent communication and software errors.  ...  Acknowledgments This work is supported by the European Commission FP7 Cooperation Program "Transport (incl. aeronautics)", for the GRAIN Coordination and Support Action ("Greener Aeronautics International  ... 
doi:10.1007/978-3-642-31128-4_31 fatcat:i2j7lu7itjbuxobzg3ynzdhgza

Application-Level Resilience Modeling for HPC Fault Tolerance [article]

Luanzheng Guo, Hanlin He, Dong Li
2017 arXiv   pre-print
However, RFI provides little information on how fault tolerance happens, and RFI results are often not deterministic due to its random nature.  ...  Our methodology is based on the observation that at the application level, the application resilience to faults is due to the application-level fault masking.  ...  In terms of the impact of faults on applications, we focus on the execution correctness.  ... 
arXiv:1705.00267v1 fatcat:4yuglaybrnfglh5vuljtsl5wz4

Fault and timing analysis in critical multi-core systems: A survey with an avionics perspective

Andreas Löfwenmark, Simin Nadjm-Tehrani
2018 Journal of systems architecture  
This paper reviews major contributions that assess the impact of fault tolerance on worst-case execution time of processes running on a multi-core platform.  ...  We consider the classic approach for analyzing the impact of faults in such systems, namely fault injection.  ...  NFFP6-2013-01203 and NFFP7-2017-04890.  ... 
doi:10.1016/j.sysarc.2018.04.001 fatcat:74tk5j6kyjfmxpufn3x7dph6ve

Fault Tolerance and Resilience in Cloud Computing Environments [chapter]

Ravi Jhawar, Vincenzo Piuri
2014 Cyber Security and IT Infrastructure Protection  
In this chapter, we focus on characterizing the recurrent failures in a typical Cloud computing environment, analyzing the effects of failures on user's applications, and surveying fault tolerance solutions  ...  We also discuss the perspective of offering fault tolerance as a service to user's applications as one of the effective means to address user's reliability and availability concerns.  ...  party (the fault tolerance service provider ftSP), specify its requirements based on the business needs, and transparently possess desired fault tolerance properties without studying the low level fault  ... 
doi:10.1016/b978-0-12-416681-3.00001-x fatcat:u3kfhz6rabgg5flhpi3cr2inxq

Fault Tolerance and Resilience in Cloud Computing Environments [chapter]

Ravi Jhawar, Vincenzo Piuri
2017 Computer and Information Security Handbook  
In this chapter, we focus on characterizing the recurrent failures in a typical Cloud computing environment, analyzing the effects of failures on user's applications, and surveying fault tolerance solutions  ...  We also discuss the perspective of offering fault tolerance as a service to user's applications as one of the effective means to address user's reliability and availability concerns.  ...  party (the fault tolerance service provider ftSP), specify its requirements based on the business needs, and transparently possess desired fault tolerance properties without studying the low level fault  ... 
doi:10.1016/b978-0-12-803843-7.00009-0 fatcat:42ea433b6jc3bixlesngbksnea

Fault Tolerance and Resilience in Cloud Computing Environments [chapter]

Ravi Jhawar, Vincenzo Piuri
2013 Computer and Information Security Handbook  
In this chapter, we focus on characterizing the recurrent failures in a typical Cloud computing environment, analyzing the effects of failures on user's applications, and surveying fault tolerance solutions  ...  We also discuss the perspective of offering fault tolerance as a service to user's applications as one of the effective means to address user's reliability and availability concerns.  ...  party (the fault tolerance service provider ftSP), specify its requirements based on the business needs, and transparently possess desired fault tolerance properties without studying the low level fault  ... 
doi:10.1016/b978-0-12-394397-2.00007-6 fatcat:vz2fa7vizff3pieob53hmi6n4e

Fault Analysis and Non-Redundant Fault Tolerance in 3-Level Double Conversion UPS Systems Using Finite-Control-Set Model Predictive Control

Luís Caseiro, André Mendes
2021 Energies  
This paper proposes a non-redundant fault-tolerant double conversion uninterruptible power supply based on 3-level converters.  ...  However, highly differentiated corrective actions are taken depending on the fault type and location, maximizing post-fault performance in each case.  ...  This minimizes the negative impact of fault correction on all UPS system converters and maximizes its overall performance.  ... 
doi:10.3390/en14082210 fatcat:qxhkftlgvjdezfipuk22psy7oq

Simplified programming of faulty sensor networks via code transformation and run-time interval computation

L S Bai, R P Dick, P A Dinda, P H Chou
2011 2011 Design, Automation & Test in Europe  
We describe a system that makes it unnecessary for sensor network application developers and users to understand the intricate implementation details of fault detection and tolerance techniques, while  ...  FACTS is an extension of an existing sensor network programming language; its compiler and runtime libraries have been modified to support automatic generation of code for on-line fault detection and tolerance  ...  FACTS hides faults from programmers but indicates the impact of low-level faults on application outputs, i.e., the end results of data processing expressions in the application specification.  ... 
doi:10.1109/date.2011.5763023 dblp:conf/date/BaiDDC11 fatcat:y7lceiu6wzh5tg2nr54fiypcra

Essential Fault-Tolerance Metrics for NoC Infrastructures

Cristian Grecu, Lorena Anghel, Partha P. Pande, Andre Ivanov, Resve Saleh
2007 13th IEEE International On-Line Testing Symposium (IOLTS 2007)  
Fault-tolerant design of Network-on-chip communication architectures requires the addressing of issues pertaining to different elements described at different levels of design abstraction -these may be  ...  specific to architecture, interconnection, communication and application issues.  ...  ., through flit-level recovery [13] [14] ), the impact of failures may be minimal and, therefore, it may not affect the application at all.  ... 
doi:10.1109/iolts.2007.31 dblp:conf/iolts/GrecuAPIS07 fatcat:yciftxongrf7da4hlrwvvqso5e

Toward Exascale Resilience

Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, Marc Snir
2009 The international journal of high performance computing applications  
It is also anticipated that the current approach for resilience, which relies on automatic or application level checkpoint-restart, will not work because the time for checkpointing and restarting will  ...  This set of projections leaves the community of fault tolerance for HPC system with a difficult challenge: finding new approaches, possibility radically disruptive, to run applications until their normal  ...  fault tolerance at the system level.  ... 
doi:10.1177/1094342009347767 fatcat:s7i4a7aocnckzka4bxsyzbg6qi

Software fault tolerance methodology and testing for the embedded PowerPC

Mark Bucciero, John Paul Walters, Matthew French
2011 2011 Aerospace Conference  
Our work targets scientific applications operating on space-based FPGA architectures consisting of an FPGA and a radiation-hardened controller.  ...  We use heartbeat monitoring, control flow assertions, and checkpoint/rollback to achieve high performance and low overhead fault tolerance.  ...  SOFTWARE FAULT TOLERANCE Now that we have at least a rudimentary understanding of the PowerPC 405 architecture, we can create fault detection and correction methods to mitigate the impact of SEUs on the  ... 
doi:10.1109/aero.2011.5747460 fatcat:oh3owy5yrjaivmp2xp3zdfqeaa

A survey on simulation-based fault injection tools for complex systems

Maha Kooli, Giorgio Di Natale
2014 2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS)  
The principle of this approach is to insert faults into the system and to monitor its responses in order to observe its behavior in the presence of faults.  ...  This paper presents a survey on the simulationbased fault injection techniques, with a focus on complex microprocessor based systems.  ...  RIFLE is the only pin-level fault injector that is able to perform analysis in order to observe the impact of faults on the processor [22] . F.  ... 
doi:10.1109/dtis.2014.6850649 dblp:conf/dtis/KooliN14 fatcat:5ch2mf57lng4ze3gx5ln7m7c6i

Applications resilience on clouds

Toan Nguyen, Jean-Antoine Desideri, Laurentiu Trifan
2012 2012 International Conference on High Performance Computing & Simulation (HPCS)  
Cloud computing infrastructures support system and network fault-tolerance. They transparently repair and prevent communication and software errors.  ...  It also overviews a testbed used to to design, deploy, execute, monitor, restart and resume distributed applications on cloud infrastructures in cases of failures.  ...  It is also supported by the French National Research Agency ANR (Agence Nationale de la Recherche) for the OMD2 project (Optimisation Multi-Discipline Distribuée), grant ANR-08-COSI-007, program COSINUS  ... 
doi:10.1109/hpcsim.2012.6266891 dblp:conf/ieeehpcs/NguyenDT12 fatcat:3mrdz7x3cjdyvi4i5zbr6x7dia
« Previous Showing results 1 — 15 out of 52,328 results