Abstract
Community Question Answering (CQA) has recently become a popular means of satisfying personal information needs. However, as the quality of the answers posted to CQA sites vary widely, there is a need for effectively extracting high-quality answers from CQA data. In this study, we first manually analyzed high-quality answers from the Yahoo! Chiebukuro data, and then constructed a system that automatically detects high-quality answers based on the analysis. More specifically, we randomly sampled 50 questions from four Yahoo! question categories, namely, ``Love Consultation,'' ``Personal Computer,'' ``General Knowledge'' and ``Politics,'' and two of the authors manually selected high-quality answers from the answers to these questions. Then, based on the analysis of these answers, we constructed an answer quality estimator based on a machine learning algorithm that uses detailedness, presence of evidence, and politeness as features. Our system outperformed the human assessors for the Personal Computer and General Knowledge categories, while its evaluation was comparable to the assessors for Love Consultation and Politics. These findings suggest the possibility of the system that automatically discovers high-quality answers from the CQA archives.