Improving Zero-Day Malware Testing Methodology Using Statistically Significant Time-Lagged Test Samples [article]

Konstantin Berlin, Joshua Saxe
2016 arXiv   pre-print
Enterprise networks are in constant danger of being breached by cyber-attackers, but making the decision about what security tools to deploy to mitigate this risk requires carefully designed evaluation of security products. One of the most important metrics for a protection product is how well it is able to stop malware, specifically on "zero"-day malware that has not been seen by the security community before. However, evaluating zero-day performance is difficult, because of larger number of
more » ... eviously unseen samples that are needed to properly measure the true and false positive rate, and the challenges involved in accurately labeling these samples. This paper addresses these issues from a statistical and practical perspective. Our contributions include first showing that the number of benign files needed for proper evaluation is on the order of a millions, and the number of malware samples needed is on the order of tens of thousands. We then propose and justify a time-delay method for easily collecting large number of previously unseen, but labeled, samples. This enables cheap and accurate evaluation of zero-day true and false positive rates. Finally, we propose a more fine-grain labeling of the malware/benignware in order to better model the heterogeneous distribution of files on various networks.
arXiv:1608.00669v1 fatcat:orha2fpjkzfb7ftxpu7zggumgy