主催: The Japanese Society for Artificial Intelligence
会議名: 2017年度人工知能学会全国大会(第31回)
回次: 31
開催地: 愛知県名古屋市 ウインクあいち
開催日: 2017/05/23 - 2017/05/26
In this paper we present our research in improving a tokenizer for Ainu language. Tokenization is a process where a sentence is separated into basic elements, such as words or morphemes. Ainu language is a critically endangered language of Ainu people living in northern parts of Japan. Since Ainu language originally did not have a writing system, document in Ainu language are usually transcribed in an systematized way. To allow effective processing and contribute to further revitalization of Ainu language, we combine multiple official Ainu language dictionaries to improve tokenization of such documents. We also compare state-of-the-art tokenizer with custom one based for the needs of this research.