2025 年 33 巻 p. 537-551
The reuse of third-party code, such as open-source software (OSS), enhances software development efficiency but may introduce vulnerabilities that pose significant risks to systems. This paper focuses on known vulnerabilities originating from reused code, referred to as “code clone” (CC), with the specific term vulnerable CC used to denote vulnerable fragment. Previous studies only detect vulnerable CCs that are almost exactly matched or within a limited scope in the inspected software. In this paper, we developed SHERRY, a precise approach to detecting vulnerable CCs. It enables the detection of vulnerable CCs that are not precisely matched by converting the function code into a fine-grained set of features consisting of line-by-line elements. For scalability, SHERRY reduces comparisons and calculations similarity using logical operations. Furthermore, We analyzed 50 high-profile OSS projects, tracking vulnerable CCs detected by SHERRY and examining how developers manage them. SHERRY improved recall by over 10% and accelerated processing time 17-fold without limiting scope in a comparison experiment with existing techniques using the same 10 OSS. Our measurements also revealed 87 vulnerable CCs in 22 OSS projects, and more than half of them were comparable to the most dangerous software weakness type. We finds that there are three causes of why vulnerable CCs remain in OSS repositories. Ultimately, we conclude with practical suggestions to prevent the propagation of vulnerable CCs in the OSS ecosystem.