大規模言語モデルを用いたバイナリコードの機能推定手法

染谷 実奈美; 大塚 玲

doi:10.11517/pjsai.JSAI2024.0_4M1GS1005

Abstract

Functional estimation of binary code is useful for malware analysis and vulnerability detection when analyzing programs for which source code is not available. Binary code is more difficult to understand than source code because it lacks symbolic information such as function and variable names. Although recent large language models (LLMs) have shown remarkable ability in understanding natural language and source code, their applicability to binary code has not yet been clarified. Therefore, this study aims to apply LLMs to functional inference of binary code, and tackles the task of function name prediction. We use Gemini Pro to extract the rationale for function name estimation, and then fine-tune Code Llama using the rationale and function names. Evaluation experiments showed that learning the rationale and the function name improved performance compared to fine-tuning with only the function name. Furthermore, our method outperformed Gemini Pro with Chain-of-Thought Prompting.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!