StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric

Shantanu Gupta, Amin Ansari, Shuguang Feng, Scott Mahlke
2010 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN)  
Manufacture-time process variation and life-time failure projections have become a major industry concern. Consequently, fault tolerance, historically of interest only for mission-critical systems, is now gaining attention in the mainstream computing space. Traditionally reliability issues have been addressed at a coarse granularity, e.g., by disabling faulty cores in chip multiprocessors. However, this is not scalable to higher failure rates. In this paper, we propose StageWeb, a fine-grained
more » ... earout and variation tolerance solution, that employs a reconfigurable web of replicated processor pipeline stages to construct dependable many-core chips. The interconnection flexibility of StageWeb simultaneously tackles wearout failures (by isolating broken stages) and process variation (by selectively disabling slower stages). Our experiments show that through its wearout tolerance, a StageWeb chip performs up to 70% more cumulative work than a comparable chip multiprocessor. Further, variation mitigation in StageWeb enables it to scale supply voltage more aggressively, resulting in up to 16% energy savings.
doi:10.1109/dsn.2010.5544915 dblp:conf/dsn/GuptaAFM10 fatcat:fomccdvdxvgc5btcnwbunyhdba