Mining Historical Issue Repositories to Heal Large-Scale Online Service Systems

Rui Ding, Qiang Fu, Jian Guang Lou, Qingwei Lin, Dongmei Zhang, Tao Xie
2014 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks  
Online service systems have been increasingly popular and important nowadays. Reducing the MTTR (Mean Time to Restore) of a service remains one of the most important steps to assure the user-perceived availability of the service. To reduce the MTTR, a common practice is to restore the service by identifying and applying an appropriate healing action. In this paper, we present an automated mining-based approach for suggesting an appropriate healing action for a given new issue. Our approach
more » ... sts an appropriate healing action by adapting healing actions from the retrieved similar historical issues. We have applied our approach to a real-world and large-scale product online service. The studies on 243 real issues of the service show that our approach can effectively suggest appropriate healing actions (with 87% accuracy) to reduce the MTTR of the service. In addition, according to issue characteristics, we further study and categorize issues where automatic healing suggestion faces difficulties.
doi:10.1109/dsn.2014.39 dblp:conf/dsn/DingFLLZX14 fatcat:mwoqzei27rff3o7u5eacnrkv2q