人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
2段階抽出手法によるオークションの出品情報からの属性情報抽出
宮崎 林太郎塚原 裕常西村 純前田 直人森 辰則小林 寛之石川 雄介田中 裕也翁 松齢
著者情報
ジャーナル フリー

2011 年 26 巻 2 号 p. 376-386

詳細
抄録

In order to achieve faceted search in net auction system, several researchers have dealt with the automated extraction of attributes and their values from descriptions of exhibits. In this paper, we propose a two-staged method to improve the performance of the extraction. The proposed method is based on the following two assumptions. 1) Identifying whether or not each sentence includes the target information is easier than extracting the target information from raw plain text. 2) Extracting the target information from the sentences selected in the first stage is easier than extracting the target information from the entire raw plain text. In the first stage, the method selects each sentence in a description that is judged to have attributes and/or values. In this stage, each sentence is represented a bag-of-words-styled feature vector, and is labeled as selected or not by a classifier derived by SVM. In the second stage, the extraction of attributes and values are performed on the cleaned text that does not contain parts of description irrelevant to exhibits, like descriptions for the postage, other exhibits, and so on. In the second stage, we adopt a sequential labeling method similar to named entity recognizers. The experimental result shows that the proposed method improves both the precision and the recall in the attribute-value extraction than only using second-stage extraction method. This fact supports our assumptions.

著者関連情報
© 2011 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top