Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4M1-GS-10-05
Conference information

Functional Estimation Method for Binary Code using Large Language Models
*Minami SOMEYAAkira OTSUKA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Functional estimation of binary code is useful for malware analysis and vulnerability detection when analyzing programs for which source code is not available. Binary code is more difficult to understand than source code because it lacks symbolic information such as function and variable names. Although recent large language models (LLMs) have shown remarkable ability in understanding natural language and source code, their applicability to binary code has not yet been clarified. Therefore, this study aims to apply LLMs to functional inference of binary code, and tackles the task of function name prediction. We use Gemini Pro to extract the rationale for function name estimation, and then fine-tune Code Llama using the rationale and function names. Evaluation experiments showed that learning the rationale and the function name improved performance compared to fine-tuning with only the function name. Furthermore, our method outperformed Gemini Pro with Chain-of-Thought Prompting.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top