タグ付き文書を対象とした多言語全文検索システム

阪口 哲男; 中尾 茂岳; 前田 亮; 杉本 重雄; 田畑 孝一

doi:10.2964/jsikproc.7.0_49

Abstract

　The Internet enables people to share documents written in various languages worldwide. Many documents on the Internet are provided by the WWW. Most of them are markupped with HTML tags. The tags which indicate document elements are very useful for full-text retrieval. The author considers that a full-text retrieval system for tagged multilingual documents is very important to get useful information. This article describes a multilingual full-text retrieval system for tagged documents. It has functions to store and retrieve SGML, XML, and HTML documents. The system handles character code sets both IS0-2022-JP-2 and Unicode for multilingual texts. It is developped with Java for portability. This article also discusses the performance issues of the implemented system.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!