Abstract
This paper proposes an automatic coding system for occupational data obtained in SSM (Social Stratification and Social Mobility) survey and needed to be translated into occupational codes, as an example of supporting systems for coding a quantity of answers from an open-ended question which require statistical processing. The proposed system is characterized by three points; 1)it can understand the occupational data by natural language process such as morphological and semantic analysis by case frame, 2)it presents the definitions of categories (occupations) by case frame and constructs a dictionary, 3)it constructs thesaurus for each predicates and slots in the dictionary to make relation to the dictionary with answers. This system showed relative efficiency in using on 1995 SSM data (the precision 84.1%, the recall 62.0% in raw data), and will be improved by accurate inputting, rich rules and vocabulary in the dictionary and the thesaurus. The effects of this system in coding work; 1)decreasing work loads, 2)assuring consistency, 3)showing explicit rules, are the solve for the problems in open-ended questions. In addition, this system can be useful for checking the results of SSM occupational data already coded by human hands.