Constructing a Dataset of Functionally Equivalent Python Methods Using Test Generation Techniques

Shiyu YANG; Yusheng GUO; Akihiro TABATA; Yoshiki HIGO

doi:10.1587/transinf.2025EDP7092

抄録

As one of the most widely used programming languages in modern software development, Python hosts a vast open-source codebase on GitHub, where code reuse is widespread. This study leverages open-source Python projects from GitHub and applies automated testing to discover pairs of functionally equivalent methods. We collected and processed methods from 5,100 Python repositories, but Python's lack of static type checking presented unique challenges for grouping these methods. To address this, we conducted detailed type inference and organized methods based on their inferred types, providing a structured foundation for subsequent analysis. We then employed automated test generation to produce unit tests for each method, running them against one another within their respective groups to identify candidate pairs that yielded identical outputs from the same inputs. Through manual verification, we ultimately identified 68 functionally equivalent method pairs and 683 functionally non-equivalent pairs. These pairs were compiled into a comprehensive dataset, serving as the basis for further examination. With this dataset, we not only evaluated the ability of large language models (LLMs) to recognize functional equivalence, evaluating both their accuracy and the challenges posed by diverse implementations, but also conducted a systematic performance evaluation of equivalent methods, measuring execution times and analyzing the underlying causes of efficiency differences. The findings demonstrate the potential of LLMs to identify functionally equivalent methods and highlight areas requiring further advancement.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PPV is available from https://globals.ieice.org/en_transactions/information

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）