タンパク質言語モデルの強化学習における報酬関数の評価

高瀬 諒一; 井島 大弥; 刑部 好弘; 淺原 彰規; 小山 光

doi:10.11517/pjsai.JSAI2024.0_3Xin293

Abstract

In pharmaceutical development, protein language models (pLMs) and reinforcement learning (RL) have become essential techniques for designing desired protein sequences. In this paper, we investigate the effect of loss functions in reward model training, since reward models are central to obtaining protein sequences with better performance. Two types of typical loss functions, such as mean squared error and ranking loss, are used to train reward models.Numerical experiments have shown that there is no significant difference in the performance evaluation of the reward models alone. However, it turned out that the difference in the loss functions affect to the pLMs after performing RL. The ranking loss tends to provide better performance and to keep the distribution of pLMs during RL, resulting in obtaining desired protein sequences with better performance.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!