Statistical similarity of binaries

Yaniv David, Nimrod Partush, Eran Yahav
2016 SIGPLAN notices  
We address the problem of finding similar procedures in stripped binaries. We present a new statistical approach for measuring the similarity between two procedures. Our notion of similarity allows us to find similar code even when it has been compiled using different compilers, or has been modified. The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment
more » ... similarity into similarity between procedures. We have implemented our approach in a tool called Esh, and applied it to find various prominent vulnerabilities across compilers and versions, including Heartbleed [3], Shellshock [5] and Venom [7] . We show that Esh produces high accuracy results, with few to no false positives -a crucial factor in the scenario of vulnerability search in stripped binaries.
doi:10.1145/2980983.2908126 fatcat:3stb4vrpvbci7kac4soik3irbm