A Fault Tolerant Scheduling Model for Directed Acyclic Graphs in Cloud

Pedro Henrique Di Francia Rosso, Emilio Francesquini
2020 Anais da XI Escola Regional de Alto Desempenho de São Paulo (ERAD-SP 2020)   unpublished
Many High Performance Computing (HPC) and resource intensive applications have been tested and migrated to the Cloud. These applications may have high data input size, which often has a high correlation to execution performance and time. Migration to the Cloud demands adaptation of the fault tolerance (FT) and scheduling approaches. Although those topics are well connected, they are often treated separately. This work proposes a novel integrated scheduling and FT model which takes into account
more » ... he characteristics of the tasks and the target execution nodes. Preliminary results indicate good potential to improve system reliability and execution makespan of scientific workflows.
doi:10.5753/eradsp.2020.16883 fatcat:nwappj72f5csbml4es52wuqu2m