人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
Japanese Mistakable Legal Term Correction using Infrequency-aware BERT Classifier
Yamakoshi TakahiroKomamizu TakahiroOgawa YasuhiroToyama Katsuhiko
著者情報
キーワード: legal term, term correction, Japanese, BERT
ジャーナル フリー

2020 年 35 巻 4 号 p. E-K25_1-17

詳細
抄録

We propose a method to assist legislative drafters that locates inappropriate legal terms in Japanese statutorysentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislationdrafting rules. Our method predicts suitable legal terms using a classifier based on BERT (Bidirectional EncoderRepresentations from Transformers). The BERT classifier is pretrained with a huge number of whole sentences; thus,it contains abundant linguistic knowledge. Classifiers for predicting legal terms suffer from two-level infrequency:term-level infrequency and set-level infrequency. The former causes a class imbalance problem and the latter causesan underfitting problem; both degrade classification performance. To overcome these problems, we apply threetechniques, namely, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. Thepreliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences,the repetitive soft undersampling overcomes term-level infrequency, and the classifier unification overcomes set-levelinfrequency while saving storage consumption. Our experiments show that our classifier outperforms conventionalclassifiers using Random Forest or language models, and that all three training techniques improve performance.

著者関連情報
© The Japanese Society for Artificial Intelligence 2020
前の記事 次の記事
feedback
Top