Article ID: 2024IIL0001
Background: Code s generation tools such as GitHub Copilot have received attention due to their performance in generating code. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, which would negatively affect software development productivity. Additionally, conducting prior analysis of new code generation tools is often time-consuming. Aim: To use a new code generation tool without prior analysis but with low risk, we propose to evaluate the new tools during software development (i.e., online optimization). Method: We apply the bandit algorithm (BA) approach to help select the best code suggestion or generation tool among a list of candidates. Developers evaluate whether the result of the tool is correct or not. When code generation and evaluation are repeated, the evaluation results are saved. We utilize the stored evaluation results to select the best tool based on the BA approach. In our preliminary analysis, we evaluated five tools with 164 code-generation cases using BA. Result: BA approach selected ChatGPT as the best tool as the evaluation proceeded, and during the evaluation, the average accuracy by BA approach outperformed the second-best performing tool. Our results reveal the feasibility and effectiveness of BA in assisting the selection of best-performing code suggestion or generation tools.