To identify a synonym is a necessary procedure for text processing such as information retrieval and text mining. We can expect to improve the proficiency and performance in text processing by constructing a synonym dictionary. Same words might possibly be used as a different meaning if the target field differs, so a synonym dictionary has to be constructed for each field. In some fields in Japanese, such as in aviation, synonym nouns include kanji/hiragana, katakana, alphabet and their abbreviations. Many of these words are not registerd in a general dictionary. In addition, as new words always come to be used, the dictionary update is a big issue.
In this paper, we propose a system for constructing a synonym dictionary. The system will return synonym candidates on the descending order of similarity against a query. A synonym can be easily registered in a dictionary by looking the synonym candidates generated by the proposed system. We define a context information as words frequency appearing around a target word. Then a similarity is calculated by cosine measure using context information. We confirmed that the system performance was remarkably improved by providing the system with known synonym set to make context word nominalization, especially when the performance was low. We experimentally evaluated the system performance by aviation safety reports in Japanese and evaluated it by average precision, and got promising results.
View full abstract