Reducing Soft Errors through Operand Width Aware Policies
IEEE Transactions on Dependable and Secure Computing
Soft errors are an important challenge in contemporary microprocessors. Particle hits on the components of a processor are expected to create an increasing number of transient errors with each new microprocessor generation. In this paper, we propose simple mechanisms that effectively reduce the vulnerability to soft errors in a processor. Our designs are generally motivated by the fact that many of the produced and consumed values in the processors are narrow and their upper order bits are
... ngless. Soft errors caused by any particle strike to these higher order bits can be avoided by simply identifying these narrow values. Alternatively, soft errors can be detected or corrected on the narrow values by replicating the vulnerable portion of the value inside the storage space provided for the upper order bits of these operands. As a faster but less fault tolerant alternative to ECC and parity, we offer a variety of schemes that make use of narrow values and analyze their efficiency in reducing soft error vulnerability of different data-holding components of a processor. On average, techniques that make use of the narrowness of the values can provide 49 percent error detection, 45 percent error correction, or 27 percent error avoidance coverage for single bit upsets in the first level data cache across all Spec2K. In other structures such as the immediate field of the issue queue, an average error detection rate of 64 percent is achieved.