Detecting, validating and characterizing computer infections in the wild
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference - IMC '11
Although network intrusion detection systems (IDSs) have been studied for several years, their operators are still overwhelmed by a large number of false-positive alerts. In this work we study the following problem: from a large archive of intrusion alerts collected in a production network, we want to detect with a small number of false positives hosts within the network that have been infected by malware. Solving this problem is essential not only for reducing the falsepositive rate of IDSs,
... t also for labeling traces collected in the wild with information about validated security incidents. We use a 9-month long dataset of IDS alerts and we first build a novel heuristic to detect infected hosts from the on average 3 million alerts we observe per day. Our heuristic uses a statistical measure to find hosts that exhibit a repeated multi-stage malicious footprint involving specific classes of alerts. A significant part of our work is devoted to the validation of our heuristic. We conduct a complex experiment to assess the security of suspected infected systems in a production environment using data from several independent sources, including intrusion alerts, blacklists, host scanning logs, vulnerability reports, and search engine queries. We find that the false positive rate of our heuristic is 15% and analyze in-depth the root causes of the false positives. Having validated our heuristic, we apply it to our entire trace, and characterize various important properties of 9 thousand infected hosts in total. For example, we find that among the infected hosts, a small number of heavy hitters originate most outbound attacks and that future infections are more likely to occur close to already infected hosts.