Evaluating and exploiting impacts of dynamic power management schemes on system reliability

Liangzhen Lai, Vikas Chandra, Puneet Gupta
2015 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)  
Hardware reliability has been a major concern for nano-scale computing systems. Different hardware design choices, application workloads and software management schemes can jointly affect the system's resilience. In this paper, we first develop a hardware evaluation platform based on an embedded/mobile development board and standard Linux kernel. We demonstrate the use of our platform to evaluate the system's power and radiation-induced soft error rate in presence of system power management
more » ... mes and with different application workloads and various hardware design configurations. We also propose system/cloud-based virtual sensing to capture varying ambient conditions for reliability evaluation. New reliability management policies are proposed and implemented in Linux kernel to exploit the flexibility in different existing power management schemes. We demonstrate that our policies can achieve the system reliability target under varying application workloads and ambient conditions. Experiments show that our policies are efficient and with less than 3% additional power overhead compared to the optimal schemes characterized offline.
doi:10.1109/cases.2015.7324544 dblp:conf/cases/LaiCG15 fatcat:cgheyc5h4ndsbatjru6u54eqr4