自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
A Cross-linguistic Analysis of the Effects of Character-level Information in Neural Models
Tomoya KurosawaHitomi Yanaka
著者情報
ジャーナル フリー

2024 年 31 巻 3 号 p. 1193-1238

詳細
抄録

Characters are the smallest units of natural language, and humans understand texts from characters. Past studies have attempted to train language models with the information obtained from character sequences (character-level information) in addition to tokens to improve the performance of these models in various natural language processing tasks in various languages. However, they treated the performance improvement by character-level information as a performance difference between with and without characters. The extent to which these models use character-level information to solve these tasks remains unclear. The effects of linguistic features such as morphological factors on differences in the performance across languages are also under investigation. In this study, we examine existing character-employed neural models and the variation in their performance with character-level information. We focus on four languages: English, German, Italian, and Dutch, and three tasks: part-of-speech (POS) tagging, dependency parsing, and Discourse Representation Structure (DRS) parsing. The experimental results show that character-level information has the greatest effects on model performance on POS tagging and dependency parsing tasks in German and on a DRS parsing task in Italian. Based on these results, we hypothesize that the significant effects on model performance in German is caused by the average lengths of the words and the forms of common nouns. A detailed analysis reveals a strong correlation between the average lengths of the words and effectiveness on POS tagging in German.

著者関連情報
© 2024 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top