ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures

Kamal Kc, Xiaohui Gu
2011 2011 IEEE 30th International Symposium on Reliable Distributed Systems  
We present an Efficient Log-based Troubleshooting(ELT) system for cloud computing infrastructures. ELT adopts a novel hybrid log mining approach that combines coarse-grained and fine-grained log features to achieve both high accuracy and low overhead. Moreover, ELT can automatically extract key log messages and perform invariant checking to greatly simplify the troubleshooting task for the system administrator. We have implemented a prototype of the ELT system and conducted an extensive
more » ... ntal study using real management console logs of a production cloud system and a Hadoop cluster. Our experimental results show that ELT can achieve more efficient and powerful troubleshooting support than existing schemes. More importantly, ELT can find software bugs that cannot be detected by current cloud system management practice.
doi:10.1109/srds.2011.11 dblp:conf/srds/KcG11 fatcat:hdkmo5vnvjgntm4y6cmbpf76lq