人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
動向情報編纂のためのテキストからの統計量表現の自動抽出
森 辰則藤岡 篤史村田 一郎
著者情報
ジャーナル フリー

2008 年 23 巻 5 号 p. 310-318

詳細
抄録

In order to summarize trend information in document and visualize it, we have to have a method to automatically extract statistical information from document. In this paper, we investigate automated extraction of statistical information, especially, expressions of name of statistical information. First, we classify those expressions into three categories, namely, the action type, the attribute type, and the definition type. Second, the internal structures of them are examined. According to the internal structures, we defined an XML tag set to annotate each part of names of statistical information. As a feasibility study of automated extraction of them, we conducted an experiment in which parts of names of statistics are extracted by using a standard chunking algorithm. The experimental result shows that the parts of names of statistics defined by the tag set can be extracted with good accuracy in the case that we can prepare a training corpus of the domain similar to target documents. On the other hand, the extraction accuracy will be degraded when we cannot prepare such a training corpus.

著者関連情報
© 2008 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top