Journal of the Japanese Association for Digital Humanities
Online ISSN : 2188-7276
Articles
Multiple-policy Character Annotation based on CHISE
Tomohiko Morioka
Author information
JOURNALS FREE ACCESS

2015 Volume 1 Issue 1 Pages 86-106

Details
Abstract

This paper describes a knowledge based character processing model to resolve some problems of coded character model. Currently, in the field of information processing of digital texts, each character is represented and processed by the “Coded Character Model.” In this model, each character is defined and shared using a coded character set (code) and represented by a code-point (integer) of the code. In other words, when knowledge about characters is defined (standardized) in a specification of a coded character set, then there is no need to store large and detailed knowledge about characters into computers for basic text processing. In terms of flexibility, however, the coded character model has some problems, because it assumes a finite set of characters, with each character of the set having a stable concept shared in the community. However, real character usage is not so static and stable. Especially in Chinese characters, it is not so easy to select a finite set of characters which covers all usages. To resolve these problems, we have proposed the “Chaon” model. This is a new model of character processing based on character ontology. This report briefly describes the Chaon model and the CHISE (Character Information Service Environment) project, and focuses on how to represent Chinese characters and their glyphs in the context of multiple unification rules.

Information related to the author
© Tomohiko Morioka
Previous article Next article
feedback
Top