Computer Software
Print ISSN : 0289-6540
Determining Cross-Polytope Locality-Sensitive Hashing Parameters for Code Clone Detection
Shogo TOKUINorihiro YOSHIDAEunjong CHOIKatsuro INOUE
Author information
JOURNAL FREE ACCESS

2021 Volume 38 Issue 4 Pages 4_60-4_82

Details
Abstract

A code clone is a code fragment that has identical or similar code fragments to it in the source code. A code clone detector CCVolti has been developed using Cross-Polytope Locality-Sensitive Hashing (LSH). CCVolti can detect not only syntactic clones but also semantic clones, which are difficult to be detected. However, CCVolti has two problems: (1) the detection time depends on Cross-Polytope LSH, and (2) several missed code clones. In this study, we propose an approach to determine Cross-Polytope LSH parameters to obtain a target value of recall given by a user and save as much time as possible. The approach builds a linear regression model that learns suitable parameters based on the size of target projects and then determines appropriate Cross-Polytope LSH parameters for a code clone detection target. Finally, we apply this approach with CCVolti to 20 open source software projects and confirm this approach's effectiveness.

Content from these authors
© 2021, Japan Society for Software Science and Technology
Previous article Next article
feedback
Top