This paper presents an unsupervised scene classification method for recognizing indoor scenes. Background and foreground features are respectively extracted using Gist and Scale-Invariant Feature Transform (SIFT) as feature representation based on context. Our method creates Bags of Features (BoF) to vote VWs (Visual Words) of SIFT and Gist features to a two-dimensional histogram. Moreover, our method can generate labels as a candidate of categories for input images while maintaining stability and plasticity together. Automatic labeling of category maps can be realized using labels created using Adaptive Resonance Theory (ART) as teaching signals for Counter Propagation Networks (CPNs). We evaluated classification accuracy of semantic categories such as a corridor and a room using KTH-IDOL datasets which are released for evaluating robot localization and navigation. The mean classification accuracy of Gist, SIFT, OC-SVM, PIRF, and our method reached to 39.7%, 58.0%, 56.0%, 63.6%, and 87.5%, respectively. The result of our method is 23.9% higher than that of PIRF. Moreover, we applied our method to a mobile robot for evaluating availability of our unsupervised classification method.
抄録全体を表示