JSAI Technical Report, SIG-SLUD
Online ISSN : 2436-4576
Print ISSN : 0918-5682
97th (Feb, 2023)
Conference information

Interpreting Split-Characters with Contextual Information
Takagi KENTORyo INUITsuyoshi YAMAMURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 38-43

Details
Abstract

SNS posts are an effective information because they contain a wide variety of postings. However, posts on SNS contain unique expressions which are different from those used in newspapers and other media. Therefore, it is difficult to analyze them using traditional natural language processing, and special processing is required. In this study, we focus on Split-Characters among the unique expressions. Split-Characters refer to characters in which one character is divided into multiple characters. In the previous study, OCR was used to visually process Split-Characters. However, because OCR is a method for identifying Split-Characters by character recognition, it does not use contextual information and does not consider the propriety of the corrected sentence. In this study, we propose methods for Interpreting Split-Characters using contextual information. Three models with contextual information are used: N-gram, RNN, and BERT. We propose methods to interpret Split-Characters using these models, and verify whether the proposed methods can convert SplitCharacters into correct ones.

Content from these authors
© 2023 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top