Long-range dependence in a changing Internet traffic mix

Cheolwoo Park, Félix Hernández-Campos, J.S. Marron, F. Donelson Smith
2005 Computer Networks  
This paper provides a deep analysis of long-range dependence in a continually evolving Internet traffic mix by employing a number of recently developed statistical methods. Our study considers time-of-day, day-of-week, and cross-year variations in the traffic on an Internet link. Surprisingly large and consistent differences in the packet-count time series were observed between data from 2002 and 2003. A careful examination, based on stratifying the data according to protocol, revealed that the
more » ... large difference was driven by a single UDP application that was not present in 2002. Another result was that the observed large differences between the two years showed up only in packet-count time series, and not in byte counts (while conventional wisdom suggests that these should be similar). We also found and analyzed several of the time series that exhibited more "bursty" characteristics than could be modeled as Fractional Gaussian Noise. The paper also shows how modern statistical tools can be used to study long-range dependence and non-stationarity in Internet traffic data. arrival counts as we would expect, we always observe a process that is almost as variable as the one observed at the finer scales. This property of the variance in packet or byte arrivals in Internet traffic, which is known as self-similarity or scaleinvariance, holds true for scales from a few hundred milliseconds up to hundreds of seconds. Quantitatively, the decay in the variance of packet or byte arrival counts in fixed intervals of time for such self-similar traffic is proportional to m 2H−2 . Here m≥1 represents the scale of time aggregation of counts, and H is known as the Hurst parameter. For a time series of counts generated by a Poisson process (not self-similar), H=0.5, while Hoe(0.5,1) for a stationary, self-similar process. Values of H > 1 indicate non-stationarity. The closer the value of the Hurst parameter is to 1, the more slowly the variance decays as scale (m) increases, and the traffic is said to be more bursty. The slow decay of variance in arrival counts as scale increases in self-similar traffic is in sharp contrast to the mathematical framework provided by Poisson modeling in which the variance of the arrivals process decays as the square root of the scale (see [20], [26]). Self-similarity also manifests itself as long-range dependence (or long memory) in the time series of arrivals. This means that there are non-negligible correlations between the arrival counts in time intervals that are far apart. More formally, the autocorrelation function, ρ(k), of long-range dependent time series decays in proportion to k −b as the lag k (the distance between elements in the series) tends to infinity, where 0<β<1. The Hurst parameter is related to β via H=1−β/2, so the closer the value of the Hurst parameter is to 1, the more slowly the autocorrelation function decays. In contrast, Poisson models are short-range dependent, i.e., their autocorrelation decays exponentially as the lag increases. The implied "failure of Poisson modeling" [26] for Internet traffic spawned an active field of research in analysis of network traffic. Some of the research closely related to this paper is reviewed in section 2. One of the major strengths of the early studies was that they were based on a significant number of high-quality network traces (high quality in the sense that they captured hours or days of operation on production networks and were recorded with reasonably accurate and precise timestamps for each packet). In recent years, however, there have been only a few studies that examined network traffic in the modern Internet using empirical data comparable in quantity and quality to the earlier studies (a few exceptions, notably from Sprint Labs, are described in section 2). There are several reasons for this decline in studies based on empirical data. One is that network links have dramatically increased in speed from the 10 Mbps Ethernets monitored for the early studies to the 1000 Mbps Ethernets and 2500 Mbps (OC-48) or faster technologies commonly
doi:10.1016/j.comnet.2004.11.018 fatcat:oipluqxkp5b43djopu6ghpkhwy