A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Measuring the Impact of Memory Errors on Application Performance
2017
IEEE computer architecture letters
Memory reliability is a key factor in the design of warehousescale computers. Prior work has focused on the performance overheads of memory fault-tolerance schemes when errors do not occur at all, and when detected but uncorrectable errors occur, which result in machine downtime and loss of availability. We focus on a common third scenario, namely, situations when hard but correctable faults exist in memory; these may cause an "avalanche" of errors to occur on affected hardware. We expose how
doi:10.1109/lca.2016.2599513
fatcat:lmnmtq2zdjdm5fak2zaieyzjsi