On Deceiving Malware Classification with Section Injection
Machine Learning and Knowledge Extraction
We investigate how to modify executable files to deceive malware classification systems. This work's main contribution is a methodology to inject bytes across a malware file randomly and use it both as an attack to decrease classification accuracy but also as a defensive method, augmenting the data available for training. It respects the operating system file format to make sure the malware will still execute after our injection and will not change its behavior. We reproduced five
... rt malware classification approaches to evaluate our injection scheme: one based on Global Image Descriptor (GIST) + K-Nearest-Neighbors (KNN), three Convolutional Neural Network (CNN) variations and one Gated CNN. We performed our experiments on a public dataset with 9339 malware samples from 25 different families. Our results show that a mere increase of 7% in the malware size causes an accuracy drop between 25% and 40% for malware family classification. They show that an automatic malware classification system may not be as trustworthy as initially reported in the literature. We also evaluate using modified malware alongside the original ones to increase networks robustness against the mentioned attacks. The results show that a combination of reordering malware sections and injecting random data can improve the overall performance of the classification. All the code is publicly available.