Abstract
Recently, it is known that a document categorization method using Probabilistic Latent Semantic Indexing
(PLSI) model is effective for Japanese benchmark data sets. PLSI model which is a method of compressing a
document-term matrix is compressed into low dimensional one based on probabilistic structure, is effective for
Japanese benchmark data sets. The purpose of this study is sets to verify whether the document classification
method by PLSI model is effective to the document of not only in Japanese but in Chinese. The document
classification method using PLSI is applied to the Chinese newspaper articles, and the classification precision is
examined.