On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware [chapter]

Médéric Hurier, Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon
2016 Lecture Notes in Computer Science  
There is generally a lack of consensus in Antivirus (AV) engines' decisions on a given sample. This challenges the building of authoritative ground-truth datasets. Instead, researchers and practitioners may rely on unvalidated approaches to build their ground truth, e.g., by considering decisions from a selected set of Antivirus vendors or by setting up a threshold number of positive detections before classifying a sample. Both approaches are biased as they implicitly either decide on ranking
more » ... products, or they consider that all AV decisions have equal weights. In this paper, we extensively investigate the lack of agreement among AV engines. To that end, we propose a set of metrics that quantitatively describe the dierent dimensions of this lack of consensus. We show how our metrics can bring important insights by using the detection results of 66 AV products on 2 million Android apps as a case study. Our analysis focuses not only on AV binary decision but also on the notoriously hard problem of labels that AVs associate with suspicious les, and allows to highlight biases hidden in the collection of a malware ground trutha foundation stone of any malware detection approach.
doi:10.1007/978-3-319-40667-1_8 fatcat:a65osdn3pvc55ku7afdq6vy2ty