Reduced K-means is one of the methods used in cluster analysis. It simultaneously estimates both the dimensional axes of subspace that condense multivariate data and the cluster centroids in the subspace. I will point out the merit of this technique by showing an example in which it is applied to a large size of data obtained from a social survey. I compare the results from the Reduced K-means with those from the K-means method, as well as those from another method combining principal component analysis. Application of the Reduced K-means method requires the user to select the number of both dimensions and clusters. How to make such decisions will also be discussed based on the objective indices of cluster assessment.
When using an open-ended response collected by social surveys as basic data in statistical processing, after-coding must be conducted to classify it into one of the pre-defined codes (classes). After-coding is a heavy burden for a coder and can possibly be misclassified when the open-ended response is ambiguous or insufficient. Hence, we need to collect a response containing sufficient information for classification to avoid such a situation. However, requesting this task to a respondent and a survey taker is not easy. In this paper, we propose a new system in which a survey taker brings a computer with knowledge that can classify an open-ended response into a valid code of all codes. The computer then asks additional question to the respondent if it perceives that a response does not contain sufficient information for classification, and subsequently extracts the remaining information effectively. This decision is determined when a confidence level of the computer to a result, which is estimated by the scores accompanied by the results, is lower. After collecting information, the computer reclassifies a new open-ended response into a valid code. The proposed system has the additional advantage of being able to supply basic data immediately. We are constructing a new system for occupational coding which is a representative after-coding. The system has not been completed yet, but shows efficacy by a small experiment. In future work, we will completely implement the system and evaluate it by respondents, survey takers, and coders. Moreover, we will expand the system for generalization.
Beijing and Hangzhou are two typical cities in Northern and Southern China, where have different economic bases and environmental quality. Taking Beijing and Hangzhou as examples, this paper aims to clarify the structural features and influential factors of people's environmental consciousness by quantitatively analyzing the survey data. From the similarities of environmental consciousness in two cites, it was found that Chinese are having stronger concerns toward the environment under the background of environmental degradation. However, people's high expectation on science and technology in solving environmental problems was also indicated. Comparative results of environmental consciousness in two cities showed that Beijing citizens had a lower satisfaction on environmental quality, stronger worries on environmental deterioration, and more active pro-environmental behaviors compared to Hangzhou citizens. Different environmental quality in two cities is supposed to be the main reason to cause these differences. In addition, using the elderly, and people with low income and education levels as the main targets of environmental education are suggested to promote environmental consciousness effectively. Furthermore, logistic regression analysis indicated the directions to promote pro-environmental behaviors in two cities.