A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps [chapter]

John Daly
2003 Lecture Notes in Computer Science  
As the run time of an application approaches the the mean time to interrupt (MTTI) for the system on which it is running, it becomes necessary to generate intermediate snapshots of the application's run state, known as checkpoint files or restart dumps. In the event of a system failure that halts program execution, these snapshots allow an application to resume computing from the most recently saved intermediate state instead of starting over at the beginning of the calculation. In this paper
more » ... ree models for predicting the optimum compute intervals between restart dumps are discussed. These models are evaluated by comparing their results to a simulation that emulate an application running on a actual system with interrupts. The results will be used to derive a simple method for calculating the optimum restart interval.
doi:10.1007/3-540-44864-0_1 fatcat:iua5g2e62rgzhhw4vriybkgtaq