A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Optimal real number codes for fault tolerant matrix operations
2009
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
It has been demonstrated recently that single fail-stop process failure in ScaLAPACK matrix multiplication can be tolerated without checkpointing. Multiple simultaneous processor failures can be tolerated without checkpointing by encoding matrices using a real-number erasure correcting code. However, the floating-point representation of a real number in today's high performance computer architecture introduces round off errors which can be enlarged and cause the loss of precision of possibly
doi:10.1145/1654059.1654089
dblp:conf/sc/Chen09
fatcat:ud4ruwqgkvbkxnvkncahmcphem