Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
This research investigates how multilingual Large Language Models (LLMs) encode gender biases in English and Japanese. It is plausible that gender biases appear differently according to the language in which we train LLMs. However, it remains to be discovered how multilingual LLMs learn and encode gender biases for different languages. We extract gender bias features for multiple languages using Sparse Auto-Encoders (SAEs) and see if the features are identical among languages. More specifically, we give multilingual LLMs gender-stereotypical and anti-gender-stereotypical texts. We extract interpretable features from neurons in the inner layers of LLMs using SAEs and look for the features that fire differently between the two texts. Then, we compare the feature activations between the English and Japanese cases. The experimental results indicate that gender bias is encoded in the distinct parts of multilingual LLMs according to the languages.