Machine Learning-based Analysis of Program Binaries: A Comprehensive Study

Hongfa Xue, Shaowen Sun, Guru Venkataramani, Tian Lan
2019 IEEE Access  
Binary code analysis is crucial in various software engineering tasks, such as malware detection, code refactoring, and plagiarism detection. With the rapid growth of software complexity and the increasing number of heterogeneous computing platforms, binary analysis is particularly critical and more important than ever. Traditionally adopted techniques for binary code analysis are facing multiple challenges, such as the need for cross-platform analysis, high scalability and speed, and improved
more » ... idelity, to name a few. To meet these challenges, machine learning-based binary code analysis frameworks attract substantial attention due to their automated feature extraction and drastically reduced efforts needed on large-scale programs. In this paper, we provide the taxonomy of machine learning-based binary code analysis, describe the recent advances and key findings on the topic, and discuss the key challenges and opportunities. Finally, we present our thoughts for future directions on this topic. INDEX TERMS Machine learning, program binary analysis, taxonomy.
doi:10.1109/access.2019.2917668 fatcat:fwjpykkdpjev7pzkhaoily4zci