Independent global snapshots in large distributed systems

M.V. Sreenivas, S. Bhalla
Proceedings Fourth International Conference on High-Performance Computing  
Distributed systems depend on consistent global snapshots for process recovery and garbage collection activity. We provide exact conditions for an arbitrary checkpoint based on independent dependency tracking within clusters of nodes.. The method permits that nodes (within clusters) can independently compute dependency information based on available ( local ) information. The existing models of global snapshot computations provide the necessary and sucient conditions. But, these require
more » ... e global computations. The proposed computations can be performed by a node to identify existing global checkpoints. The nodes can also compute conditions to make a checkpoint, or conditions, such that a collection of checkpoints, can belong to a global snapshot.
doi:10.1109/hipc.1997.634530 dblp:conf/hipc/SreenivasB97 fatcat:jrjteouxirexli5hmcxficnqru