Towards automated protocol reverse engineering using semantic information
Proceedings of the 9th ACM symposium on Information, computer and communications security - ASIA CCS '14
Network security products, such as NIDS or application firewalls, tend to focus on application level communication flows. However, adding support for new proprietary and often undocumented protocols, implies the reverse engineering of these protocols. Currently, this task is performed manually. Considering the difficulty and time needed for manual reverse engineering of protocols, one can easily understand the importance of automating this task. This is even given more significance in today's
... bersecurity context where reaction time and automated adaptation become a priority. Several studies were carried out to infer protocol's specifications from traces. As shown in this article, they do not provide accurate results on complex protocols and are often not applicable in an operational context to provide parsers or traffic generators, some key indicators of the quality of obtained specifications. In addition, too few previous works have resulted in the publication of tools that would allow the scientific community to experimentally validate and compare the different approaches. In this paper, we infer the specifications out of complex protocols by means of an automated approach and novel techniques. Based on communication traces, we reverse the vocabulary of a protocol by considering embedded contextual information. We also use this information to improve message clustering and to enhance the identification of fields boundaries. We then show the viability of our approach through a comparative study including our re-implementation of three other state-of-the-art approaches (ASAP, Discoverer and ScriptGen).