A Deep Neural Network-Based Approach to Finding Similar Code Segments

Dong Kwan KIM

doi:10.1587/transinf.2019EDL8195

Abstract

This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!