Methods for fault tolerance in networks-on-chip

Martin Radetzki, Chaochao Feng, Xueqian Zhao, Axel Jantsch
2013 ACM Computing Surveys  
Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance
more » ... and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.
doi:10.1145/2522968.2522976 fatcat:3t4b3rhbgbc2bphjevpkzlpm6u