Spatio-temporal compressive sensing and internet traffic matrices

Yin Zhang, Matthew Roughan, Walter Willinger, Lili Qiu
2009 Computer communication review  
Many basic network engineering tasks (e.g., traffic engineering, capacity planning, anomaly detection) rely heavily on the availability and accuracy of traffic matrices. However, in practice it is challenging to reliably measure traffic matrices. Missing values are common. This observation brings us into the realm of compressive sensing, a generic technique for dealing with missing values that exploits the presence of structure and redundancy in many realworld systems. Despite much recent
more » ... ss made in compressive sensing, existing compressive-sensing solutions often perform poorly for traffic matrix interpolation, because real traffic matrices rarely satisfy the technical conditions required for these solutions. To address this problem, we develop a novel spatio-temporal compressive sensing framework with two key components: (i) a new technique called SPARSITY REGULARIZED MATRIX FAC-TORIZATION (SRMF) that leverages the sparse or low-rank nature of real-world traffic matrices and their spatio-temporal properties, and (ii) a mechanism for combining low-rank approximations with local interpolation procedures. We illustrate our new framework and demonstrate its superior performance in problems involving interpolation with real traffic matrices where we can successfully replace up to 98% of the values. Evaluation in applications such as network tomography, traffic prediction, and anomaly detection confirms the flexibility and effectiveness of our approach. ing [11, 24] , capacity planning, and anomaly detection. Due to their importance, there is now a substantial body of work on TMs, for instances see [2] and the references therein. The thrust of much of this research has been on measurement [10,28] and inference [9, 17, 25, 27, [31] [32] [33] [34] of TMs, and more recently on topics such as anomaly detection [13, 14, 21, 29, 30] . A key challenge that lies at the heart of many of these problems is how to cope with missing values that frequently arise in real-world TMs. In this paper, we propose novel interpolation techniques to accurately reconstruct missing values in TMs based on partial and/or indirect measurements. In the process, we provide a unified approach to several common tasks involving measurement and analysis of traffic matrices; e.g., TM estimation, prediction, and anomaly detection. Our approach uses the first truly spatio-temporal model of TMs, borrows ideas from the active area of compressive sensing, and exploits domain knowledge regarding TMs that has accumulated over the years. Motivation: In practice it is challenging to reliably measure TMs for large networks. First, in many networks the TM is not directly observable, and can only be estimated through link load measurements. Such measurements, while linearly related to the TM itself, are not sufficient to unambiguously identify the true TM. Typically, the problem was posed as an underconstrained linear-inverse problem, where the solution relied on a prior model of the TM (e.g., the Poisson model of Vardi [27], the gravity model [31, 33] , or the independent flow model [9] ). Second, although many networks now collect (sampled) flow-level measurements for at least part of their network, there are still serious impediments to reliable large-scale collection of TMs: data collection systems can fail, flow collectors often use an unreliable transport protocol, and legacy network components may not support flow collection or be resource challenged. Third, scalability requirements may mean that flow-level collection doesn't occur at the edge of a network (where we would wish it for true TM recovery [10]), but often only on some subset of the routers. Recovery of the actual ingress-egress TM from such data is non-trivial. Finally, when we find an anomaly in a set of TMs, we often need to know the non-anomaly-related traffic either for other network tasks, or just so that we can infer the cause of the anomaly. The result is that any large set of TM measurements has some, and quite often, a significant number of missing values. Since many network engineering tasks that require TMs are either intolerant or highly sensitive to missing data, it is important to accurately reconstruct missing values based on partial and/or indirect TM measurements. Interpolation is the mathematical term for filling in these missing values. Compressive sensing is a generic methodology for dealing with missing values that leverages the presence of certain types of structure and redundancy in data from many real-world systems. Compressive sensing has recently attracted considerable attention in statistics, approximation theory, information theory, and signal processing. Several effective heuristics have been proposed to exploit the sparse or low-rank nature of
doi:10.1145/1594977.1592600 fatcat:2shnwwwcqrhe7lkw3g4jkkx4wq