人工知能学会全国大会論文集
Online ISSN : 2758-7347
31st (2017)
セッションID: 1L1-5
会議情報

Combining Multiple Dictionaries to Improve Tokenization of Ainu Language
*プタシンスキ ミハウ伊藤 優花ノヴァコフスキ カロル本間 宏利中島 陽子桝井 文人
著者情報
会議録・要旨集 フリー

詳細
抄録

In this paper we present our research in improving a tokenizer for Ainu language. Tokenization is a process where a sentence is separated into basic elements, such as words or morphemes. Ainu language is a critically endangered language of Ainu people living in northern parts of Japan. Since Ainu language originally did not have a writing system, document in Ainu language are usually transcribed in an systematized way. To allow effective processing and contribute to further revitalization of Ainu language, we combine multiple official Ainu language dictionaries to improve tokenization of such documents. We also compare state-of-the-art tokenizer with custom one based for the needs of this research.

著者関連情報
© 2017 The Japanese Society for Artificial Intelligence
前の記事 次の記事
feedback
Top