Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Court Case Dataset for Japanese Online Offensive Language Detection
Shohei HisadaShoko WakamiyaEiji Aramaki
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 4 Pages 1598-1634

Details
Abstract

The growing social concern over offensive language on digital platforms has spurred research into datasets and automated detection to better understand its nature and develop countermeasures. Existing datasets often simplify tasks and rely on subjective judgments by nonexpert annotators through crowdsourcing. This approach leads to a disconnect from actual issues and a lack of consideration for social and cultural contexts, indicating the need for approaches that adjust to individual societal contexts while utilising social science expertise. This study proposes a Japanese dataset for offensive language detection based on Japanese court cases. Our dataset utilises labels for offensive language, legal rights such as the right to reputation and sense of honour, and judicial decisions. Furthermore, by validating the automated detection methods, we identify gaps in practical issues and discuss areas for improvement. This research aims to build a dataset that reflects real societal issues, promoting fairer content moderation practices and fostering discussions on integrating expertise from other domains.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top