Comparing assembler procedures by analyzing sequences of opcodes

Nikola Pejić, Miloš Cvetanović, Zaharije Radivojević
2020 Telfor Journal  
Static analysis of executables for the purpose of comparing them can be made more difficult if the binaries are created using different compilers. In order to compensate for the noise introduced by the compilers, the arguments of the instructions are usually discarded as having a low signal-tonoise ratio. As compiler can often apply instruction reordering, some approaches only compare statistical information about the instructions, or compare their subsequences in order to measure their
more » ... asure their similarity. This paper presents an approach for estimating the similarity of procedures given in assembler form (disassembled binaries) by analyzing their sequences of opcodes. The approach first encodes the opcodes into integer values by mapping opcodes that represent similar actions into the same values, and then calculates a relative Levenshtein distance between the two sequences of integers. The proposed approach is evaluated and compared with some existing approaches, where it showed to have on average around 6% higher recall than the second-best approach.
doi:10.5937/telfor2001046p fatcat:ekjrkyhd45aozi6apv435mqtmq