Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 3L6-OS-32-05
Conference information

Investigating Gender Bias in Multilingual Large Language Models Using Sparse Auto-Encoders
*Tota ABENamgi HANYusuke MIYAO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This research investigates how multilingual Large Language Models (LLMs) encode gender biases in English and Japanese. It is plausible that gender biases appear differently according to the language in which we train LLMs. However, it remains to be discovered how multilingual LLMs learn and encode gender biases for different languages. We extract gender bias features for multiple languages using Sparse Auto-Encoders (SAEs) and see if the features are identical among languages. More specifically, we give multilingual LLMs gender-stereotypical and anti-gender-stereotypical texts. We extract interpretable features from neurons in the inner layers of LLMs using SAEs and look for the features that fire differently between the two texts. Then, we compare the feature activations between the English and Japanese cases. The experimental results indicate that gender bias is encoded in the distinct parts of multilingual LLMs according to the languages.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top