Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
In Japanese sentences, the meaning of the context may differ depending on the insertion point of punctuation, so the position of the punctuation is very important. In this research, we create a general method that automatically complements punctuation from text information using deep learning. The proposed method is that the corpus is split using morphological analysis and replaced infrequent words with parts of speech and performs classification of exists of a period or comma using LSTM from word strings before and after the target position. The accuracy of classification has been improved by setting a threshold for the probability output by the model. Furthermore, by limiting the number of input words and replacing them with parts of speech, the calculation time can be reduced without reducing the calculation accuracy. Using this method, experiments using broadcast manuscripts as text corpora have confirmed the effectiveness of this method.