Abstract
Question classification is of crucial importance for question answering.In question classification, the accuracy of ML algorithms was found to significantly outperform other approaches.The two key issues in classification with a ML-based approach are classifier design and feature selection.Support Vector Machines is known to work well for sparse, high dimensional problems.However, the frequently used Bag-of-Words approach does not take full advantage of information contained in a question.To exploit this information we introduce three new feature types: Subordinate Word Category, Question Focus and Syntactic-Semantic Structure.As the results demonstrate, the inclusion of the new features provides higher accuracy of question classification compared to the standard Bag-of-Words approach and other ML based methods such as SVM with the Tree Kernel, SVM with Error Correcting Codes and SNoW.A classification accuracy of 85.6 % obtained using the three introduced feature types is, as of yet the highest reported in the literature, bringing error reduction of 27% compared to the Bag-of-Words approach.