A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Better Safe than Sorry
2017
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '17
Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adopted approach to deal with them is to recompute work already done. Recomputing lost work, however, requires allocation of extra resource to re-execute
doi:10.1145/3078597.3078600
dblp:conf/hpdc/GhitE17
fatcat:n7xyk65pprg6xd7omtw2fvkttq