人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
A Non-image-based Subcharacter-level Method to Encode the Shape of Chinese Characters
Yuanzhi KeMasafumi Hagiwara
著者情報
ジャーナル フリー

2020 年 35 巻 2 号 p. C-J74-1-11

詳細
抄録

Most characters in the Chinese and Japanese languages are ideographic compound characters composedof subcharacter elements arranged in a planar manner. Token-based and image-based subcharacter-level modelshave been proposed to leverage the subcharacter elements. However, on the one hand, the conventional tokenbasedsubcharacter-level models are blind to the planar structural information; on the other hand, the image-basedmodels are weak with respect to ideographic characters that share similar shapes but have different meanings.These characteristics motivate us to explore non-image-based methods to encode the planar structural informationof characters. In this paper, we propose and discuss a method to encode the planar structural information by learningembeddings of the categories of the structure types. Our proposed model adds the structure embeddings to theconventional subcharacter embeddings and position embeddings before they are input into the encoder. In this way,the model learns the planar structural and positional information and retains the uniqueness of each character. Weevaluated the method in a text classification task. In the experiment, the embeddings were encoded by a CNN encoder,and then the encoded vectors were input into an LSTM classifier to classify product reviews as positive or negative.We compared the proposed model with models that use only the subcharacter embeddings, the structure embeddingor the position embeddings as well as with the conventional models in previous works. The results show that addingboth structure embeddings and position embeddings leads to more rich and representative features and better fittingon the dataset. The proposed method results in at best 1:8% better recall, 0:63% better F-score, and 0:55% better accuracy on the testing datasets compared to previous methods.

著者関連情報
© The Japanese Society for Artificial Intelligence 2020
前の記事 次の記事
feedback
Top