日本感性工学会論文誌
Online ISSN : 1884-5258
ISSN-L : 1884-0833
原著論文
内容推測に適したキーワード抽出のための日本語ストップワード
國府 久嗣山崎 治子野坂 政司
著者情報
ジャーナル フリー

2013 年 12 巻 4 号 p. 511-518

詳細
抄録

Extracting keywords from a target text data is essential for an analysis to describe substance characteristics of message content. We picked a use of a stopword filter from among alternatives because the method has the advantage that it is simple yet effective way. The filter we present was made up of non-content words and low-content words. Non-content-bearing words consisted mainly of function words and were gotten rid of by using part-of-speech (POS) tag information. High occurrence rate words in remaining had prospects of being keywords, however usually there were some low-content words like delexical verbs and so on. This article presents a stopword list obtained to come up with low-content words by sensuous manual procedures carried out using 40 text files from the CASTEL/J database and establishes it in the view of general versatility.

著者関連情報
© 2013 日本感性工学会
前の記事 次の記事
feedback
Top