動向情報編纂のためのテキストからの統計量表現の自動抽出

森 辰則; 藤岡 篤史; 村田 一郎

doi:10.1527/tjsai.23.310

抄録

In order to summarize trend information in document and visualize it, we have to have a method to automatically extract statistical information from document. In this paper, we investigate automated extraction of statistical information, especially, expressions of name of statistical information. First, we classify those expressions into three categories, namely, the action type, the attribute type, and the definition type. Second, the internal structures of them are examined. According to the internal structures, we defined an XML tag set to annotate each part of names of statistical information. As a feasibility study of automated extraction of them, we conducted an experiment in which parts of names of statistics are extracted by using a standard chunking algorithm. The experimental result shows that the parts of names of statistics defined by the tag set can be extracted with good accuracy in the case that we can prepare a training corpus of the domain similar to target documents. On the other hand, the extraction accuracy will be degraded when we cannot prepare such a training corpus.

著者関連情報

お気に入り & アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）