AI安全性のための日本語徳倫理データセットの作成

竹下 昌志; ジェプカ ラファウ; 荒木 健治

doi:10.11517/pjsai.JSAI2024.0_3G1GS1104

Abstract

Some AI models, such as large language models (LLMs), are known to generate harmful content for humans. AI researchers conduct AI alignment research to ensure that AI models understand our ethics and behave appropriately. However, most of these studies are conducted in English, with few studies in Japanese. Thus, this study creates a dataset for AI safety based on virtue ethics, a major stance in normative ethics. We create a new dataset in Japanese using the same construction method as that used to create the existing English virtue ethics dataset. The created dataset consists of approximately 20,000 cases, and we evaluate whether the AI model can correctly classify the correspondence between sentences describing an action and the character trait terms describing that action. We experimented with existing Japanese LLMs and found that it is difficult for these models to classify the correspondence correctly. We also compared our dataset with an existing English virtue ethics dataset.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!