A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Computer clusters are today the reference architecture for highperformance computing. The large number of nodes in these systems induces a high failure rate. This makes fault tolerance mechanisms, e.g. process checkpoint/restart, a required technology to effectively exploit clusters. Most of the process checkpoint/restart implementations only handle volatile states and do not take into account persistent states of applications, which can lead to incoherent application restarts. In this paper,doi:10.1109/ccgrid.2009.29 dblp:conf/ccgrid/RiteauLM09 fatcat:6lvjro6fizaghgsatk3n62evpe