Detection and classification of peer-to-peer traffic

João V. Gomes, Pedro R. M. Inácio, Manuela Pereira, Mário M. Freire, Paulo P. Monteiro
2013 ACM Computing Surveys  
The emergence of new Internet paradigms has changed the common properties of network data, increasing the bandwidth consumption and balancing traffic in both directions. These facts raise important challenges, making it necessary to devise effective solutions for managing network traffic. Since traditional methods are rather ineffective and easily bypassed, particular attention has been paid to the development of new approaches for traffic classification. This article surveys the studies on
more » ... -to-peer traffic detection and classification, making an extended review of the literature. Furthermore, it provides a comprehensive analysis of the concepts and strategies for network monitoring. ACM Reference Format: Gomes, J. V., Inácio, P. R. M., Pereira, M., Freire, M. M., and Monteiro, P. P. 2013. Detection and classification of peer-to-peer traffic: A survey. 30:2 J. V. Gomes et al. between Internet users. The once passive user has gained a new and very active role in the Internet, acting simultaneously as client and server. These important changes in the services running over the Internet and in the behavior of the end-hosts modified the traditional properties of network traffic, which is evolving towards a more balanced bandwidth usage in both directions. Additionally, most of these applications present a greedy profile, consuming as much bandwidth as they can, which may end up interfering with priority policies. Azzouna and Guillemin [2003], for example, found that 49% of the traffic in an asymmetric digital subscriber line (ADSL) link was caused by P2P applications, while Gerber et al. [2003] and observed the growth and prevalence of this kind of traffic. In 2007, ipoque conducted a world wide study about Internet traffic [Schulze and Mochalski 2007] , and the results showed that P2P file-sharing applications were producing more traffic than all the other applications together, being responsible for 49% to 83%, on average, of all Internet traffic and reaching peaks of over 95%. Another study by ipoque [Schulze and Mochalski 2009], in 2008 and 2009, concluded that although the total amount of traffic generated by P2P file-sharing has increased, its percentage has decreased to an average value of between 42.51% and 69.95%. This fact may be due to an increase of traffic generated by video streaming and file-hosting Web services, like YouTube, Tudou, or RapidShare. Yet, there have been several discussions regarding the adoption of P2P solutions by some of the these services, namely YouTube and Tudou, in order to accelerate their downloading rates and reduce the transmission cost. In fact, the Web-based CNN live channel service relies now on the P2P paradigm due to a plugin each user has to install. In spite of the share of global traffic of each Internet application, P2P systems motivate particular attention from the perspective of network management for the dual role their peers play. For a certain amount of data downloaded by a peer, a portion of data is also uploaded by the same peer. Instead of being concentrated in a dedicated server, the distribution cost of the service is thus shared by the users. While this fact is advantageous for content providers, it implies that a host receiving a service will produce additional traffic in its Internet service provider (ISP) network or local area network (LAN), as it is also providing the service to a different peer. Moreover, hosts in P2P networks usually receive and provide contents from and to several peers at the same time. Hence, P2P applications are likely to produce a much larger number of connections than typical client-server applications. In addition, mechanisms for searching contents in remote peers also cause an increment of the communications between hosts. These facts make P2P traffic management more challenging than traffic from clientserver applications, which is usually formed by a single or a few connections. Besides, of the increase of the bandwidth consumption, the amount of traffic generated by P2P applications in both directions is more balanced, as opposed to the greater weight in downstream of the traditional client-server traffic. This difference poses an important issue in terms of traffic management, as most networks (or Internet connections) were devised to offer lower bandwidth in upstream. Managing the network and implementing specific policies for P2P traffic does not necessarily means it should be blocked or heavily throttled. Nevertheless, there are techniques that can help to efficiently manage this traffic if one is able to classify it, as content caching [Karagiannis et al. 2005b; Xu et al. 2008] . Although the traffic management issues are of particular concern mainly for ISPs and network administrators [Karagiannis et al. 2005b; Freire et al. 2009 ], there are other problems, mostly related to security risks and vulnerabilities [Zhou et al. 2005; Seedorf 2006; Johnson et al. 2008 Johnson et al. , 2009 Chopra et al. 2009 ] that are magnified by the distributed nature of P2P systems and by the role of their peers, and that may affect companies and home users. While reducing the overlay distances between end hosts
doi:10.1145/2480741.2480747 fatcat:xqwmadnauncplgibq7lbwxapbu