Toward the Accurate Identification of Network Applications [chapter]

Andrew W. Moore, Konstantina Papagiannaki
2005 Lecture Notes in Computer Science  
Well-known port numbers can no longer be used to reliably identify network applications. There is a variety of new Internet applications that either do not use well-known port numbers or use other protocols, such as HTTP, as wrappers in order to go through firewalls without being blocked. One consequence of this is that a simple inspection of the port numbers used by flows may lead to the inaccurate classification of network traffic. In this work, we look at these inaccuracies in detail. Using
more » ... full payload packet trace collected from an Internet site we attempt to identify the types of errors that may result from portbased classification and quantify them for the specific trace under study. To address this question we devise a classification methodology that relies on the full packet payload. We describe the building blocks of this methodology and elaborate on the complications that arise in that context. A classification technique approaching 100% accuracy proves to be a labor-intensive process that needs to test flow-characteristics against multiple classification criteria in order to gain sufficient confidence in the nature of the causal application. Nevertheless, the benefits gained from a content-based classification approach are evident. We are capable of accurately classifying what would be otherwise classified as unknown as well as identifying traffic flows that could otherwise be classified incorrectly. Our work opens up multiple research issues that we intend to address in future work. Andrew Moore thanks the Intel Corporation for its generous support of his research fellowship the exact question of what is the necessary amount of payload one needs to capture in order to identify different types of applications.
doi:10.1007/978-3-540-31966-5_4 fatcat:n56j6a3rjnfpfjjfknw3hzkxpa