256 Hits in 6.9 sec

Suggesting meaningful variable names for decompiled code: a machine translation approach

Alan Jaffe
2017 Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017  
Decompiled code lacks meaningful variable names. We used statistical machine translation to suggest variable names that are natural given the context.  ...  This technique has previously been successfully applied to obfuscated JavaScript code, but decompiled C code poses unique challenges in constructing an aligned corpus and selecting the best translation  ...  A great deal of effort goes into selecting meaningful variable names while writing code, as source code is a primary means of communication for programmers [12] .  ... 
doi:10.1145/3106237.3121274 dblp:conf/sigsoft/Jaffe17 fatcat:vahsjqh46vh2fhxv34njgq46aq

JDATATRANS for Array Obfuscation in Java Source Code to Defeat Reverse Engineering from Decompiled Codes [article]

Praveen Sivadasan, P Sojan Lal, Naveen Sivadasan
2008 arXiv   pre-print
Software obfuscation or obscuring a software is an approach to defeat the practice of reverse engineering a software for using its functionality illegally in the development of another software.  ...  Java applications are more amenable to reverse engineering and re-engineering attacks through methods such as decompilation because Java class files store the program in a semi complied form called 'byte  ...  Decompilation is the process of generating source codes from machine codes or intermediate byte codes. JAD, Mocha, Decaf are some of the well-known decompilers [22] .  ... 
arXiv:0809.3503v1 fatcat:e2svjb6bvzhibhc5xdbqlw2nvu

Reasoning About LLVM Code Using Codewalker

David S. Hardin
2015 Electronic Proceedings in Theoretical Computer Science  
That translator provided many of the benefits of a pure decompilation into logic approach, but had the disadvantage of not being verified.  ...  The availability of Codewalker as of ACL2 7.0 has provided an opportunity to revisit this idea, and employ a more trustworthy decompilation into logic tool.  ...  Acknowledgments Many thanks to J Moore for developing Codewalker.  ... 
doi:10.4204/eptcs.192.7 fatcat:bpfuubdv3zfyneaca7ulyvxjty

When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries [article]

Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, Arvind Narayanan
2016 arXiv   pre-print
We examine programmer de-anonymization from the standpoint of machine learning, using a novel set of features that include ones obtained by decompiling the executable binary to source code.  ...  Many distinguishing features present in source code, e.g. variable names, are removed in the compilation process, and compiler optimization may alter the structure of a program, further obscuring features  ...  This research was supported in part by the Center for Information Technology and Policy at Princeton University.  ... 
arXiv:1512.08546v2 fatcat:xpqswbzugnae3lg5kiq3k7vkgu

Hybrid Obfuscation Technique to Protect Source Code From Prohibited Software Reverse Engineering

Asmara Al-Hakimi, Abu Bakar Md Sultan, Abdulazim Abd Ghani, Norhayati Mohd Ali, Novia Admodisastro
2020 IEEE Access  
The experiment has presented good and promising results, where it was nearly impossible for the reversing tool to read the obfuscated code.  ...  The string encryption is about adding a mathematical equation with arrays and loops to the strings in the code to hide the meaning.  ...  such as meaningful variables names present in source code [2] .  ... 
doi:10.1109/access.2020.3028428 fatcat:l6xwclafnfghxj2ketiybx4mwi

Obfuscation resilient binary code reuse through trace-oriented programming

Junyuan Zeng, Yangchun Fu, Kenneth A. Miller, Zhiqiang Lin, Xiangyu Zhang, Dongyan Xu
2013 Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS '13  
This paper introduces trace-oriented programming (TOP), a general framework for generating new software from existing binary code by elevating the low-level binary code to C code with templates and inlined  ...  While prior approaches have shown that binary code can be extracted and reused, they are often based on static analysis and face challenges when coping with obfuscated binaries.  ...  Acknowledgement We thank the anonymous reviewers for their insightful comments. Kenneth A. Miller is supported by a DoD scholorship under contract H98230-12-1-0452.  ... 
doi:10.1145/2508859.2516664 dblp:conf/ccs/ZengFMLZX13 fatcat:3qob5amrfbdhrnk4ugu4p77jdm

Compiling Process Graphs into Executable Code [chapter]

Rainer Hauser, Jana Koehler
2004 Lecture Notes in Computer Science  
This paper discusses two algorithms that provide such transformations for process graph models in a business process or workflow environment and produce executable programs based on Web services and orchestration  ...  Model-driven architecture envisions a paradigm shift as dramatic as the one from low-level assembler languages to high-level programming languages.  ...  Acknowledgments We thank the anonymous reviewers for their encouraging comments and valuable suggestions, and Jochen Küster, Shane Sendall, Markus Stolze and Michael Wahler for their advice, which helped  ... 
doi:10.1007/978-3-540-30175-2_17 fatcat:fmztzfn2l5euddua2k3ho4fjpa

A Survey of Machine Learning for Big Code and Naturalness [article]

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton
2018 arXiv   pre-print
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit  ...  We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature.  ...  [5, 6, 9] can be seen as a recommender systems for suggesting names for variables, methods, and classes that uses relevant code tokens as the context. Evaluation Measures.  ... 
arXiv:1709.06182v2 fatcat:hbvgyonqsjgq3nqwji6jf3aybe

Towards Neural Decompilation [article]

Omer Katz, Yuval Olshaker, Yoav Goldberg, Eran Yahav
2019 arXiv   pre-print
We present a novel approach to decompilation based on neural machine translation. The main idea is to automatically learn a decompiler from a given compiler.  ...  Given a compiler from a source language S to a target language T , our approach automatically trains a decompiler that can translate (decompile) T back to S .  ...  We presented a new approach to the decompilation problem. We base our decompiler framework on neural machine translation. Given a compiler, our framework automatically learns a decompiler from it.  ... 
arXiv:1905.08325v1 fatcat:nmd45uzjrbfqpj2meaxyu2u2ge

DIRE: A Neural Approach to Decompiled Identifier Naming [article]

Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, Bogdan Vasilescu
2019 arXiv   pre-print
Unfortunately, they do not reconstruct semantically meaningful variable names, which are known to increase code understandability.  ...  We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler.  ...  Computation for this research was also supported in part by the Pittsburgh Supercomputing Center and a gift of AWS credits from Amazon.  ... 
arXiv:1909.09029v2 fatcat:ijuhryanpbbnnd27bnosfuseou

Augmenting Decompiler Output with Learned Variable Names and Types [article]

Qibin Chen and Jeremy Lacomis and Edward J. Schwartz and Claire Le Goues and Graham Neubig and Bogdan Vasilescu
2021 arXiv   pre-print
In this paper we present DIRTY (DecompIled variable ReTYper), a novel technique for improving the quality of decompiler output that automatically generates meaningful variable names and types.  ...  Empirical evaluation on a novel dataset of C code mined from GitHub shows that DIRTY outperforms prior work approaches by a sizable margin, recovering the original names written by developers 66.4% of  ...  Name Recovery. The Decompiled Identifier Renaming Engine (DIRE) is a state-of-the-art neural approach for decompiled variable name recovery [29] .  ... 
arXiv:2108.06363v1 fatcat:up2d6ciynnhevok5cracznn6yq

Nightingale: Translating Embedded VM Code in x86 Binary Executables [chapter]

Xie Haijiang, Zhang Yuanyuan, Li Juanru, Gu Dawu
2017 Lecture Notes in Computer Science  
In this paper, we conduct an in-depth study on embedded VM based code protection and propose a de-obfuscation approach that aims to recover the original code form.  ...  Finally, the translated operations of each handler is optimized and transformed into host code. After this process, we can obtain a clear and runtime efficient code representation.  ...  This is less meaningful for VM based code obfuscation because a VM stub is generally transformed from a relatively simple function or basic block.  ... 
doi:10.1007/978-3-319-69659-1_21 fatcat:q6o6e5kb4vcvxghcerwecffvlq

Some Assembly Required - Program Analysis of Embedded System Code

Ansgar Fehnker, Ralf Huuck, Felix Rauch, Sean Seefried
2008 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation  
In this work we present a model-checking based static analysis approach which seamlessly integrates the analysis of embedded ARM assembly with C/C++ code analysis.  ...  Normally, a high-level language such as C/C++ is used for application oriented tasks and a low-level assembly language for direct interaction with the underlying hardware.  ...  We thank Bernard Blackham, Jörg Brauer, Patrick Jayet, and Michel Lussenburg for their implementation efforts and fruitful discussions.  ... 
doi:10.1109/scam.2008.15 dblp:conf/scam/FehnkerHRS08 fatcat:3hiar3vkfrcrfnnmvpxx3jbnw4

Cross-Language Binary-Source Code Matching with Intermediate Representations [article]

Yi Gui, Yao Wan, Hongyu Zhang, Huifang Huang, Yulei Sui, Guandong Xu, Zhiyuan Shao, Hai Jin
2022 arXiv   pre-print
We present a novel approach XLIR, which is a Transformer-based neural network by learning the intermediate representations for both binary and source code.  ...  Currently, several approaches have been proposed for binary-source code matching by jointly learning the embeddings of binary code and source code in a common vector space.  ...  We would like to thank all the anonymous reviewers for their constructive comments on improving this paper.  ... 
arXiv:2201.07420v1 fatcat:k6ued5xc6fcujl3d7m3qxfi6mq

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton
2018 ACM Computing Surveys  
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit  ...  We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature.  ...  [5, 6, 10] can be seen as a recommender systems for suggesting names for variables, methods, and classes by using relevant code tokens as the context.  ... 
doi:10.1145/3212695 fatcat:iuuocyctg5adjmobhc2zw23rfu
« Previous Showing results 1 — 15 out of 256 results