Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Some AI models, such as large language models (LLMs), are known to generate harmful content for humans. AI researchers conduct AI alignment research to ensure that AI models understand our ethics and behave appropriately. However, most of these studies are conducted in English, with few studies in Japanese. Thus, this study creates a dataset for AI safety based on virtue ethics, a major stance in normative ethics. We create a new dataset in Japanese using the same construction method as that used to create the existing English virtue ethics dataset. The created dataset consists of approximately 20,000 cases, and we evaluate whether the AI model can correctly classify the correspondence between sentences describing an action and the character trait terms describing that action. We experimented with existing Japanese LLMs and found that it is difficult for these models to classify the correspondence correctly. We also compared our dataset with an existing English virtue ethics dataset.