Recent advances in speech processing technologies have made the analysis of spoken language one of the central issues in natural language processing. One big feature of spoken language, that distinguishes it from written language, is that it is in various ways ill-formed, containing hesitations, repairs, ellipses, and so on. This makes it difficult to apply traditional linguistic-based methods to spoken language analysis. This paper proposes a method for properly dealing with ill-formedness, such as hesitations and repairs, in the course of syntactic/semantic analysis of spoken Japanese where sentences transcribed in Kanji-Kana characters are parsed and interpreted to obtain their frame representations. The method is based on a uniform model, which handles well-and ill-formed sentences in a uniform way, and realized by extending traditional dependency analysis. We, first, show the necessity of a uniform model with illustrating motive examples from our spoken dialogue corpus. Then, after providing the details of the method, we show its effectiveness with both illustrating examples of analyzing real sentences from the corpus and evaluating the performance of an experimental system.
View full abstract