2021 Volume 38 Issue 4 Pages 4_60-4_82
A code clone is a code fragment that has identical or similar code fragments to it in the source code. A code clone detector CCVolti has been developed using Cross-Polytope Locality-Sensitive Hashing (LSH). CCVolti can detect not only syntactic clones but also semantic clones, which are difficult to be detected. However, CCVolti has two problems: (1) the detection time depends on Cross-Polytope LSH, and (2) several missed code clones. In this study, we propose an approach to determine Cross-Polytope LSH parameters to obtain a target value of recall given by a user and save as much time as possible. The approach builds a linear regression model that learns suitable parameters based on the size of target projects and then determines appropriate Cross-Polytope LSH parameters for a code clone detection target. Finally, we apply this approach with CCVolti to 20 open source software projects and confirm this approach's effectiveness.