120,151 Hits in 5.2 sec

A Survey of Fault Tolerance in Cloud Computing

R Archana
2018 Zenodo  
Cloud computing gives services as a type of Internet-based computing using data centers that contain servers, networks, and storage.  ...  At such a large scale, hardware component failure is the normal rather than an exception. Hardware failure can lead to performance degradation to users and can result in losses to the industry.  ...  Authors present a failure model for cloud infrastructures such as server components (including VM and VMM), network and power distribution, to analyze the impact of each failure on user's applications.  ... 
doi:10.5281/zenodo.1410996 fatcat:tk5xak6l75ch7n4nzr2z3rmr3e

Toward Predictive Failure Management for Distributed Stream Processing Systems

Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, Shu-Ping Chang
2008 2008 The 28th International Conference on Distributed Computing Systems  
Failure management is essential for DSPSs that often require highlyavailable system operations.  ...  We have implemented an initial prototype of the predictive failure management framework within the IBM System S distributed stream processing system.  ...  Conclusion In this paper, we have presented a new predictive failure management framework for distributed stream processing systems.  ... 
doi:10.1109/icdcs.2008.34 dblp:conf/icdcs/GuPYC08 fatcat:mwysgwanb5acfil7hujyxlme3q

Reliability and energy efficiency in cloud computing systems: Survey and taxonomy

Yogesh Sharma, Bahman Javadi, Weisheng Si, Daniel Sun
2016 Journal of Network and Computer Applications  
We also discuss the classifications on resource failures, fault tolerance mechanisms and energy management mechanisms in cloud systems.  ...  Reliability and energy efficiency are two key challenges in cloud computing systems (CCS) that need careful attention and investigation.  ...  Dixit for sharing their constructive comments and suggestions on improving the survey. Authors are also thankful to two anonymous reviewers for their comments that greatly improved the manuscript.  ... 
doi:10.1016/j.jnca.2016.08.010 fatcat:kfsxrhje5jfatg74mawkqo4dru

Decentralized Resilient Autonomous Control Architecture for Dynamic Microgrids

Adel Nasiri, Salam Bani-Ahmed, Mohammad Rashidi
2019 IET Generation, Transmission & Distribution  
Device level and system level controller and interaction models are defined for a self-coordination. Also, microgrid energy management system (EMS) and control case scenarios are demonstrated.  ...  The controller can become a performance and reliability bottleneck for the entire system, where its failure can bring the entire system down.  ...  Acknowledgment This material is based upon work supported by the US National Science Foundation under Grant No. 1650470.  ... 
doi:10.1049/iet-gtd.2018.5816 fatcat:qliv7nrt75f6ll2fjvigtszisq

Design of distribution automation networks using survivability modeling and power flow equations

Anne Koziolek, Alberto Avritzer, Sindhu Suresh, Daniel Sadoc Menasche, Kishor Trivedi, Lucia Happe
2013 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE)  
Smart grids are fostering a paradigm shift in the realm of power distribution systems.  ...  Our empirical results indicate that the combination of survivability analysis and power flow can provide meaningful investment decision support for power systems engineers.  ...  Martins [29] presents a model for active distribution systems expansion planning that considers distributed generation together with traditional alternatives for distribution expansion such as rewiring  ... 
doi:10.1109/issre.2013.6698903 dblp:conf/issre/KoziolekASMTH13 fatcat:penj23wwerhojeamatgptrdgka

Reactive Liquid: Optimized Liquid Architecture for Elastic and Resilient Distributed Data Processing [article]

Seyed Esmaeil Mirvakili, MohammadAmin Fazli, Jafar Habibi
2019 arXiv   pre-print
In this paper, we presented a distributed architecture for elastic and resilient data processing based on the Liquid which is a nearline and offline big data architecture.  ...  We used the Reactive Manifesto to design the architecture highly reactive to workload changes and failures.  ...  enterprise messaging, Akka toolkit [9] which is an open-source JVM-based actor model toolkit for facilitating the construction of concurrent and distributed applications.  ... 
arXiv:1902.05968v1 fatcat:mihlnczafbgu7agzwjjnw3i2jy

Methodical Review on Various Fault Tolerant and Monitoring Mechanisms to improve Reliability on Cloud Environment

P. Padmakumari, A. Umamakeswari
2015 Indian Journal of Science and Technology  
Fault tolerance systems are important for both cloud provider and cloud customer.  ...  Proactive and reactive measures can take place to run the cloud environment with tolerance in failure occurrence.  ...  Petri nets used to find the effectiveness and correctness of the model Malik S. et al. 23 model for reliability assessment was proposed for general application and real time application based on time  ... 
doi:10.17485/ijst/2015/v8i35/80130 fatcat:m4esh6jsg5fonjzfecd3qaug2e

Component-based development and verification of safety critical software for a brake-by-wire system with synchronous software components

Gunzert, Nagele
1999 Proceedings International Symposium on Software Engineering for Parallel and Distributed Systems PDSE-99  
The central control computer in this distributed system, called Brake-by-Wire Manager, is a redundant design in order to tolerate any single failure.  ...  The system is based on a time-triggered communication architecture.  ...  transmission • clock synchronization service (global time-base) • integrated network management • error detection with short latency • distributed redundancy management A node in a TTP real-time system  ... 
doi:10.1109/pdse.1999.779745 dblp:conf/pdse/GunzertN99 fatcat:unt2altoazfdblwyhdpa4i4m2i

Maximizing an Organization's Information Security Posture by Distributedly Assessing and Remedying System Vulnerabilities

Yonesy F. Nunez
2008 2008 IEEE International Conference on Networking, Sensing and Control  
The current state of affairs for the vulnerability and threat management functions are in dire need of a solution that can rapidly assess systems for vulnerabilities and fix them expeditiously.  ...  By utilizing a similar approach to vulnerability assessment and patch management we can ensure a higher coverage and redundancy for all systems within and organization.  ...  vectors for seamless data distribution within large-scale environments; extending a peer-to-peer networking model for pervasive systems monitoring and management.  ... 
doi:10.1109/icnsc.2008.4525389 dblp:conf/icnsc/Nunez08 fatcat:i5esp2joj5hq5cswq6ht5hxcdq

How to Manage Failures in Air Traffic Control Software Systems [chapter]

Luca Montanari, Roberto Baldoni, Fabrizio Morciano, Marco Rizzuto, Francesca Matarese
2012 Advances in Air Navigation Services  
Failure management techniques -reactive and proactive Reactive approach The reactive approach in fault management is based on the detection paradigm.  ...  Due to the complexity and the strong requirements, current ATC systems adopt both of them. The Reactive Fault Management is based on the detection paradigm.  ... 
doi:10.5772/48685 fatcat:dluwx62eirbtdodyl6i72nckba

Risk-Driven Proactive Fault-Tolerant Operation of IaaS Providers

Jordi Guitart, Mario Macias, Karim Djemame, Tom Kirkham, Ming Jiang, Django Armstrong
2013 2013 IEEE 5th International Conference on Cloud Computing Technology and Science  
Initial results show improved ecoefficiency, virtual machine availability and reductions in SLA failure across the whole Cloud infrastructure by applying our combined risk-based fault tolerance approach  ...  In this paper a risk model methodology and holistic management approach is developed specific to the operation of the Cloud Infrastructure Provider and is applied through improvements to SLA fault tolerance  ...  and subsequent proactive management based on real data.  ... 
doi:10.1109/cloudcom.2013.62 dblp:conf/cloudcom/GuitartMDKJA13 fatcat:g3uayqfi6nhpfeos3kb3aazcja

Designing Autonomic Management Systems by Using Reactive Control Techniques

Nicolas Berthier, Eric Rutten, Noel De Palma, Soguy Mak-Kare Gueye
2016 IEEE Transactions on Software Engineering  
They provide us with high-level languages for modeling the system to manage, as well as means for statically guaranteeing the absence of logical coordination problems.  ...  The ever growing complexity of software systems has led to the emergence of automated solutions for their management.  ...  controller was built based on a reactive model of existing manager components.  ... 
doi:10.1109/tse.2015.2510004 fatcat:yqlrkr5ytvaelhjsicoanohltq

Architecture-driven self-adaptation and self-management in robotics systems

George Edwards, Joshua Garcia, Hossein Tajalli, Daniel Popescu, Nenad Medvidovic, Gaurav Sukhatme, Brad Petrus
2009 2009 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems  
We describe an architecture-centric design and implementation approach for building self-adapting and selfmanaging robotics systems.  ...  system: sensing, computation, and control, and (2) we allow meta-level components to themselves be monitored, managed and adapted by other (higher layer) meta-level components.  ...  The authors wish to express their gratitude to John Lewis for his contributions to the project.  ... 
doi:10.1109/seams.2009.5069083 dblp:conf/icse/EdwardsGTPMSP09 fatcat:67ztflnknfdexfflaibbvvulzy

Index [chapter]

2021 Microgrids and Methods of Analysis  
(HDRC system), 191e192 HDRC-based inverter-interfaced distributed energy resource units, 195 Hierarchical power management, 316e319 MG power management hierarchical structure, 316f power management  ...  probabilistic model, 101e102 Distributed flexible alternating current transmission system devices (D-FACTS devices), 154e155 Distributed generation (DG), 1, 7, 130, 153e154.  ... 
doi:10.1016/b978-0-12-816172-2.20001-1 fatcat:viuyv6ibdvcotcj4hwtlv6tfqy

Self-Adaptive Resilient Service Composition

Mario Henrique Cruz Torres, Tom Holvoet
2014 2014 International Conference on Cloud and Autonomic Computing  
Our approach is based on an agent coordination mechanism known as 'delegateMAS', which is particularly suited for large-scale coordination of systems.  ...  In this paper, we investigate a decentralized self-adaptive approach to a resilient system for service composition.  ...  We model node failures using an exponential probability distribution.  ... 
doi:10.1109/iccac.2014.33 dblp:conf/iccac/TorresH14 fatcat:4wvx66lyjnginpryexlionfpnq
« Previous Showing results 1 — 15 out of 120,151 results