Extending SRT for parallel applications in tiled-CMP architectures

D. Sanchez, J.L. Aragon, J.M. Garcia
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Reliability has become a first-class consideration issue for architects along with performance and energy-efficiency. The increasing scaling technology and subsequent supply voltage reductions are increasing the susceptibility of architectures to soft errors. However, mechanisms to achieve full coverage to errors usually degrade performance in an unacceptable way for the majority of common users. Simultaneous and Redundantly Threaded (SRT) [13] is a fault tolerant architecture in which pairs of
more » ... threads in a SMT core redundantly execute the same program instructions. In this paper, we study the under-explored architectural support of SRT to reliably execute shared-memory applications. We show how atomic operations induce a serialization point between master and slave threads. This bottleneck has an impact of 34% in execution speed for several parallel scientific benchmarks. We propose an alternative mechanism in which the L1 cache is updated by master's stores before verification reducing the overhead up to 21%. Our approach also outperforms other recent proposals such as DCC with a decrease of 8% in execution speed.
doi:10.1109/ipdps.2009.5160902 dblp:conf/ipps/SanchezAG09 fatcat:p5rknuu5ardtjb5h4lcsdmk2ne