Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection

A. Kumar, J. Xu
2006 Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications  
Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uniform sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This
more » ... tion loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodology called "sketch-guided sampling" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function ¢ of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for highspeed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows. £ ¥ § © packet), for ease of implementation. An advantage of uniform or periodic packet sampling is that it guarantees to reduce the traffic by a factor of ¦ in both long and short time scales, which is very important when the router CPU that processes sampled traffic (e.g., the hash table of flow records in NetFlow) operates under hard resource constraints 2 . 1 We assume a conservative packet size of 1000 bits here. 2 Flow sampling, to be discussed in Sec. II, is a more attractive alternative to packet sampling for certain applications, but it does not guarantee the traffic reduction ratio in the same way assured by packet sampling.
doi:10.1109/infocom.2006.326 dblp:conf/infocom/KumarX06 fatcat:6dz6syn5rzdbzioqdvyf5zjnp4