Japanese Stopword List Making for Keyword Extraction Suitable for Semantic Interpretation

Hisatsugu KOKUBU; Haruko YAMAZAKI; Masashi NOSAKA

doi:10.5057/jjske.12.511

抄録

Extracting keywords from a target text data is essential for an analysis to describe substance characteristics of message content. We picked a use of a stopword filter from among alternatives because the method has the advantage that it is simple yet effective way. The filter we present was made up of non-content words and low-content words. Non-content-bearing words consisted mainly of function words and were gotten rid of by using part-of-speech (POS) tag information. High occurrence rate words in remaining had prospects of being keywords, however usually there were some low-content words like delexical verbs and so on. This article presents a stopword list obtained to come up with low-content words by sensuous manual procedures carried out using 40 text files from the CASTEL/J database and establishes it in the view of general versatility.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

前身誌

感性工学研究論文集

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）