自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
Discovering Unusual Word Usages with Masked Language Model via Pseudo-label Training
Tatsuya AokiJey Han LauHidetaka KamigaitoHiroya TakamuraTimothy BaldwinManabu Okumura
著者情報
ジャーナル フリー

2025 年 32 巻 1 号 p. 134-175

詳細
抄録

User-generated texts contain not only non-standard words such as b4 for before, but unusual word usages such as catfish for a person who uses fake identity online, which requires knowledge about the words to handle such cases in natural language processing. We present a neural model for detecting the non-standard usages in social media text. To deal with the lack of training data for this task, we propose a method for synthetically generating pseudo non-standard examples from a corpus, which enables us to train the model without manually-annotated training data and for any arbitrary language. Experimental results on Twitter and Reddit datasets show that our proposed method achieves better performance than existing methods, and is effective across different languages.

著者関連情報
© 2025 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top