Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
We worked on the problem of paragraph segmentation from the perspective of understanding the content of novels. Estimating the paragraph of a text can be considered as a binary classification problem regarding whether the two sentences concerned belong to the same paragraph. In that case, the number of paragraphs is small relative to the number of sentences. Therefore it is necessary to consider the imbalance in the number of data. We applied the Bidirectional Encoder Representations from Transformer (BERT), which has shown high accuracy in various natural language processing tasks, to the paragraph segmentation problem. We improved the performance of the model by using focal loss as the loss function of the classifier. As a result, the effectiveness of the proposed model was confirmed in datasets made for this work. In addition, the value of each evaluation metrics was improved by expanding the range of input sentences for the model.