Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Morphological Analysis for Japanese Noisy Text based on Extraction of Character Transformation Patterns and Lexical Normalization
Itsumi SaitoKugatsu SadamitsuHisako AsanoYoshihiro Matsuo
Author information
JOURNAL FREE ACCESS

2017 Volume 24 Issue 2 Pages 297-314

Details
Abstract

Social media texts are often written in a non-standard style and include many lexical variants such as insertions, phonetic substitutions, and abbreviations that mimic spoken language. The normalization of such a variety of non-standard tokens is one promising solution for handling noisy text. A normalization task is very difficult for the morphological analysis of Japanese text because there are no explicit boundaries between words. To address this issue, we propose a novel method herein for normalizing and morphologically analyzing Japanese noisy text. First, we extract character-level transformation patterns based on a character alignment model using annotated data. Next, we generate both character-level and word-level normalization candidates using character transformation patterns and search for the optimal path based on a discriminative model. Experimental results show that the proposed method exceeds conventional rule-based system in both accuracy and recall for word segmentation and POS (Part of Speech) tagging.

Content from these authors
© 2017 The Association for Natural Language Processing
Previous article
feedback
Top