2017 Volume 24 Issue 1 Pages 135-170
In sociology, occupation and industry variables are as important as sexual and age variables. For the purpose of statistical processing, answers collected from open-ended questions in social surveys need to be converted into code, which requires considerable time and effort and often results in inconsistencies in large scale surveys. This work deals with occupation and industry coding. In this work, we develop an automatic system using hand-crafted rules and Support Vector Machines. Our system can assign three candidate codes to an answer and estimates the confidence level of the primary predicted code for each national/international standard code sets. The system has now been released through the website of the Center for Social Research and Data Archives. The user can get the required coding result by uploading the data file in a specific format.