A Deep Neural Network-Based Approach to Finding Similar Code Segments

Dong Kwan KIM
2020 IEICE transactions on information and systems  
This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels. key
more » ... s: code clone detection, Siamese architecture, convolutional neural network, abstract syntax tree (AST)
doi:10.1587/transinf.2019edl8195 fatcat:qr6f2k7dgjh2rc3s4kh77rh544