Summarizing Industrial Log Data with Latent Dirichlet Allocation

Shunmuga Prabhu Siddharthan, Marcel Dix, Barbara Sprick, Benjamin Klöpper
Industrial systems and equipment produce large log files recording their activities and possible problems. This data is often used for troubleshooting and root cause analysis, but using the raw log data is poorly suited for direct human analysis. Existing approaches based on data mining and machine learning focus on troubleshooting and root cause analysis. However, if a good summary of industrial log files was available, the files could be used to monitor equipment and industrial processes and
more » ... rial processes and act more proactively on problems. This contribution shows how a topic modeling approach based on Latent Dirichlet Allocation (LDA) helps to understand, organize and summarize industrial log files. The approach was tested on a real-world industrial dataset and evaluated quantitatively by direct annotation.
doi:10.5445/ksp/1000098011/14 fatcat:lgmif5ozmvbsni5yvmn5g7urbi