Provide Interpretability of Document Classification by Large Language Models Based on Word Masking

Atsuki Tamekuri; Saneyasu Yamaguchi

doi:10.2197/ipsjjip.32.466

Abstract

Deep neural networks have greatly improved natural language processing and text analysis technologies. In particular, pre-trained large language models have achieved significant improvement. However, it has been argued that they are black boxes and that it is important to provide interpretability. In our previous work, we focused on self-attention and proposed methods for providing and evaluating interpretability. However, the work did not use large language models, and the evaluation method used unusual sentences by deleting words. In this paper, we focus on BERT, which is a popular large language model, and its masking function instead of deleting words. We then show a problem of using this masking function to provide interpretability, which is that the mask token is not neutral for decision. We then propose an evaluation method based on this masking function with training to learn that the mask token is neutral.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!