詳細検索結果
以下の条件での結果を表示する: 検索条件を変更
クエリ検索: "Hadato"
1件中 1-1の結果を表示しています
  • Hee Yuan TAN, Hyotaek LIM
    IEICE Transactions on Information and Systems
    2011年 E94.D 巻 1 号 146-154
    発行日: 2011/01/01
    公開日: 2011/01/01
    ジャーナル フリー
    Owing to the high expressiveness of regular expression, it is frequently used in searching and manipulation of text based data. Regular expression is highly applicable in processing Latin alphabet based text, but the same cannot be said for Hangeul*, the writing system for Korean language. Although Hangeul possesses alphabetic features within the script, expressiveness of regular expression pattern using Hangeul is hindered by the absence of syllable decomposition. Without decomposition support in regular expression, searching through Hangeul text is limited to string literal matching. Literal matching has made enumeration of syllable candidates in regular expression pattern definition indispensable, albeit impractical, especially for a large set of syllable candidates. Although the existing implementation of canonical decomposition in Unicode standard does reduce a pre-composed Hangeul syllable into smaller unit of consonant-vowel or consonant-vowel-consonant letters, it still leaves quite a number of the individual letters in compounded form. We have observed that there is a necessity to further reduce the compounded letters into unit of basic letters to properly represent the Korean script in regular expression. We look at how the new canonical decomposition technique proposed by Kim can help in handling Hangeul in regular expression. In this paper, we examine several of the performance indicators of full decomposition of Hangeul syllable to better understand the overhead that might incur, if a full decomposition were to be implemented in a regular expression engine. For efficiency considerations, we propose a semi decomposition technique alongside with a notation for defining Hangeul syllables. The semi decomposition functions as an enhancement to the existing regular expression syntax by taking in some of the special constructs and features of the Korean language. This proposed technique intends to allow an end user to have a greater freedom to define regular expression syntax for Hangeul.
feedback
Top