Traffic microstructures and network anomaly detection [article]

Henry Clausen, University Of Edinburgh, David Aspinall, Gordon Ross
Much hope has been put in the modelling of network traffic with machine learning methods to detect previously unseen attacks. Many methods rely on features on a microscopic level such as packet sizes or interarrival times to identify reoccurring patterns and detect deviations from them. However, the success of these methods depends both on the quality of corresponding training and evaluation data as well as the understanding of the structures that methods learn. Currently, the academic
more » ... is lacking both, with widely used synthetic datasets facing serious problems and the disconnect between methods and data being named the "semantic gap". This thesis provides extensive examinations of the necessary requirements on traffic generation and microscopic traffic structures to enable the effective training and improvement of anomaly detection models. We first present and examine DetGen, a container-based traffic generation paradigm that enables precise control and ground truth information over factors that shape traffic microstructures. The goal of DetGen is to provide researchers with extensive ground truth information and enable the generation of customisable datasets that provide realistic structural diversity. DetGen was designed according to four specific traffic requirements that dataset generation needs to fulfil to enable machine-learning models to learn accurate and generalisable traffic representations. Current network intrusion datasets fail to meet these requirements, which we believe is one of the reasons for the lacking success of anomaly-based detection methods. We demonstrate the significance of these requirements experimentally by examining how model performance decreases when these requirements are not met. We then focus on the control and information over traffic microstructures that DetGen provides, and the corresponding benefits when examining and improving model failures for overall model development. We use three metrics to demonstrate that DetGen is able to provide more control and isolati [...]
doi:10.7488/era/2060 fatcat:lukha7hndnfbfe3gmq2stldkva