Abstract
Because Consumer Generated Media have spread, language processing technologies for that purpose are necessary. The improvement of parsing precision is demanded for both retrieval by a natural sentence and translation of such text data. We realize processing methods which can deal with analysis errors caused by fluctuating terms and ambiguous sentence structures. Specifically, we propose using a thesaurus to decide semantic distance between the terms. We have realized a system which standardizes the terms and normalizes the syntactic dependencies. Further, we examine the internal structure of predicates to recover omitted subjects and determine the “intention of a predicate”. When we analyze texts of “Yahoo! Chiebukuro”, the precision improves by about 1% compared with when the thesaurus is not used. We summarize the contents of the dictionaries our system uses.