Sketch-based change detection
Proceedings of the 2003 ACM SIGCOMM conference on Internet measurement - IMC '03
Traffic anomalies such as failures and attacks are commonplace in today's network, and identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows that need to be examined for significant changes in traffic pattern (e.g., volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is either too expensive or too slow. We propose building compact summaries
... f the traffic data using the notion of sketches. We have designed a variant of the sketch data structure, k-ary sketch, which uses a constant, small amount of memory, and has constant per-record update and reconstruction cost. Its linearity property enables us to summarize traffic at various levels. We then implement a variety of time series forecast models (ARIMA, Holt-Winters, etc.) on top of such summaries and detect significant changes by looking for flows with large forecast errors. We also present heuristics for automatically configuring the model parameters. Using a large amount of real Internet traffic data from an operational tier-1 ISP, we demonstrate that our sketch-based change detection method is highly accurate, and can be implemented at low computation and memory costs. Our preliminary results are promising and hint at the possibility of using our method as a building block for network anomaly detection and traffic measurement. Change detection has been extensively studied in the context of time series forecasting and outlier analysis [35, 36, 12, 13] . The standard techniques include different smoothing techniques (such as exponential smoothing or sliding window averaging), the Box-Jenkins ARIMA modeling [6, 7, 2], and finally the more recent wavelet-based techniques [4, 3]. Prior works have applied these techniques to network fault detection and intrusion detection. Examples in fault detection include [22, 26, 38]. Feather et al.identify faults based on statistical deviations from normal traffic behavior ; a method of identifying aberrant behavior by applying thresholds in time series models of network traffic is described in . Methods for intrusion detection include neural networks , Markov models , and clustering . Barford et al.recently provide a characterization of different types of anomalies  and propose wavelet-based methods for change detection . Unfortunately, existing change detection techniques can typically only handle a relatively small number of time series. While this may suffice for detecting changes in highly aggregated network traffic data (e.g., SNMP link counts with 5 minute sample interval), they cannot scale up to the needs at the network infrastructure (e.g., ISP) level. At an ISP level, traffic anomalies may be buried inside the aggregated traffic, mandating examination of the traffic at a much lower level of aggregation (e.g., IP address level) in order to expose them. Given today's traffic volume and link speeds, the detection method has to be able to handle potentially several millions or more of concurrent network time series. Directly applying existing techniques on a per-flow basis cannot scale up to the needs of such massive data streams. Recent research efforts have been directed towards developing scalable heavy-hitter detection techniques for accounting and anomaly detection purposes  . Note that heavy-hitters do not necessarily correspond to flows experiencing significant changes and thus it is not clear how their techniques can be adapted to support change detection.