Parallel trace analysis: project deliverable D4.3

Zafeirios Papazachos, Sakil Bharbuiya, Olumuyiwa Ibidunmoye, Amardeep Mehta, Ali Rezaei, Athanasios Tsitsipas, Gabriel González Castañé, Ahmed Ali-Eldin, Dimitrios S. Nikolopoulos, Universität Ulm
2017
CactoScale provides monitoring and data analysis functionality to CACTOS. This deliverable presents the framework and algorithms used by CactoScale for parallel trace analysis. We describe different CactoScale framework extensions which enable the implementation of parallel correlation analysis of system utilisation metric traces and cloud data logs. We also present the implementation of Lambda Architecture into CactoScale which parallelises several aspects of monitoring and exchanging
more » ... on in CACTOS. CactoScale trace analysis tackles parallelism on various dimensions. We describe a hierarchical log analysis and anomaly detection framework. The anomaly detection utilises parallel data analysis frameworks such as Spark and mapreduce framework for parallel analysis of workload traces and system logs, coupled with HDFS for in-memory processing of the data. The trace analysis also involves the pre-processing of raw data logs for storage in HDFS. It allows executing anomaly detection algorithms hierarchically, both utilising the compute nodes in situ and the parallel HDFS monitoring facility. This is feasible by pairing the CactoScale agents with in situ analytics modules to cover the cases such as workload spike detection, but also to filter the data that flows to the database for post-processing. An in situ analytic module is a process designed to run locally in a node. This tactic provides the advantage of data locality. The data are pre-processed by the local node before being collected by a remote distributed service for further processing. In this way, the hierarchical design of data analysis allows for an additional level of real-time processing which is much closer to the data source. CactoScale has different features and capabilities for parallel trace analysis which are demonstrated in this deliverable by using different algorithms for anomaly detection. Anomaly detection involves the use of trace analysis algorithms that detects outliers (numerical, textual, or correlation based) in data traces. Dete [...]
doi:10.18725/oparu-4309 fatcat:27fxep3ahred7hon7ifas2y3ee