大規模自然言語処理学習データのための複数弱仮説を生成する弱学習器を用いるAdaBoost手法

岩倉 友哉; 岡本 青史; 淺川 和雄

doi:10.1541/ieejeiss.130.83

Abstract

AdaBoost is a method to create a final hypothesis by repeatedly generating a weak hypothesis in each training iteration with a given weak learner. AdaBoost-based algorithms are successfully applied to several tasks such as Natural Language Processing (NLP), OCR, and so on. However, learning on the training data consisting of large number of samples and features requires long training time. We propose a fast AdaBoost-based algorithm for learning rules represented by combination of features. Our algorithm constructs a final hypothesis by learning several weak-hypotheses at each iteration. We assign a confidence-rated value to each weak-hypothesis while ensuring a reduction in the theoretical upper bound of the training error of AdaBoost. We evaluate our methods with English POS tagging and text chunking. The experimental results show that the training speed of our algorithm are about 25 times faster than an AdaBoost-based learner, and about 50 times faster than Support Vector Machines with polynomial kernel on the average while maintaining state-of-the-art accuracy.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!