A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing
2009
IEEE transactions on computers
As the number of processors in today's high-performance computers continues to grow, the mean-time-to-failure of these computers is becoming significantly shorter than the execution time of many current high-performance computing applications. Although today's architectures are usually robust enough to survive node failures without suffering complete system failure, most of today's high-performance computing applications cannot survive node failures. Therefore, whenever a node fails, all
doi:10.1109/tc.2009.42
fatcat:5et7fpfxvrah3jyngwe4zhoj2m