Tracking System Behaviour from Resource Usage Data [article]

Niyazi Sorkunlu, Varun Chandola, Abani Patra
<span title="2017-05-30">2017</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Resource usage data, collected using tools such as TACC Stats, capture the resource utilization by nodes within a high performance computing system. We present methods to analyze the resource usage data to understand the system performance and identify performance anomalies. The core idea is to model the data as a three-way tensor corresponding to the compute nodes, usage metrics, and time. Using the reconstruction error between the original tensor and the tensor reconstructed from a low rank
more &raquo; ... nsor decomposition, as a scalar performance metric, enables us to monitor the performance of the system in an online fashion. This error statistic is then used for anomaly detection that relies on the assumption that the normal/routine behavior of the system can be captured using a low rank approx- imation of the original tensor. We evaluate the performance of the algorithm using information gathered from system logs and show that the performance anomalies identified by the proposed method correlates with critical errors reported in the system logs. Results are shown for data collected for 2013 from the Lonestar4 system at the Texas Advanced Computing Center (TACC)
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1705.10756v1">arXiv:1705.10756v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/q724ertflbdkfbhvy23jmsuvdi">fatcat:q724ertflbdkfbhvy23jmsuvdi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200830063643/https://arxiv.org/pdf/1705.10756v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ab/55/ab55019b335a2671ef0dbd106d0bc2eb0ae860c0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1705.10756v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>