Communication delay analysis of fault-tolerant pipelined circuit switching in torus

F. Safaei, A. Khonsari, M. Fathy, M. Ould-Khaoua
2007 Journal of computer and system sciences (Print)  
Large-scale parallel systems, Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and cluster computers are often composed of hundreds or thousands of components (such as routers, channels and connectors) that collectively possess failure rates higher than what arise in the ordinary systems. One of the most important issues in the design of such systems is the development of the efficient fault-tolerant mechanisms that provide high throughput and low latency in communications to ensure
more » ... t these systems will keep running in a degraded mode until the faulty components are repaired. Pipelined Circuit Switching (PCS) has been suggested as an efficient switching method for supporting inter-processor communications in networks due to its ability to preserve both communication performance and fault-tolerant demands in such systems. This paper presents a new mathematical model to investigate the effects of failures and capture the mean message latency in torus using PCS in the presence of faulty components. Simulation experiments confirm that the analytical model exhibits a good degree of accuracy under different working conditions.
doi:10.1016/j.jcss.2007.02.003 fatcat:6twe6eydyzfzvk6erfq2itvqsq