Reliability-aware scalability models for high performance computing

Ziming Zheng, Zhiling Lan
2009 2009 IEEE International Conference on Cluster Computing and Workshops  
Scalability models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, existing scalability models do not quantify failure impact and therefore cannot accurately account for application performance in the presence of failures. In this study, we extend two well-known models, namely Amdahl's law and Gustafson's law, by considering the impact of failures and the effect of fault tolerance techniques on applications. The derived
more » ... lity-aware models can be used to predict application scalability in failure-present environments and evaluate fault tolerance techniques. Trace-based simulations via real failure logs demonstrate that the newly developed models provide a better understanding of application performance and scalability in the presence of failures.
doi:10.1109/clustr.2009.5289177 dblp:conf/cluster/ZhengL09 fatcat:7cyv7iybknappksyjb5vu3cyuy