Applications resilience on clouds

Toan Nguyen, Jean-Antoine Desideri, Laurentiu Trifan
2012 2012 International Conference on High Performance Computing & Simulation (HPCS)  
Cloud computing infrastructures support system and network fault-tolerance. They transparently repair and prevent communication and software errors. They also allow duplication and migration of jobs and data to prevent hardware failures. However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments and clouds. This paper addresses open issues and solutions for application
more » ... rors detection and management. It also overviews a testbed used to to design, deploy, execute, monitor, restart and resume distributed applications on cloud infrastructures in cases of failures.
doi:10.1109/hpcsim.2012.6266891 dblp:conf/ieeehpcs/NguyenDT12 fatcat:3mrdz7x3cjdyvi4i5zbr6x7dia