Abstract
We apply a machine learning method to occupation coding, which is a task to categorize answers to open-ended questions about respondent's occupation.Specifically, we use Support Vector Machines (SVMs) and their combination with hand-crafted rules.Conducting occupation coding manually is expensive and sometimes leads to inconsistent coding results when coders are not experts in occupation coding. For this reason, a rule-based automatic method was developed and applied.However, its categorization performance was not satisfactory.Therefore, we adopt SVMs, which show high performance in various fields, and compare them with the rule-based method.We also investigate effective combination methods of SVMs and the rulebased method.We empirically show that SVMs outperform the rule-based method in occupation coding and that the combination of the two methods yields even better accuracy, and that the accuracy of each method increases if the part of the new samples is added to the training data.