The objective of this paper is to give overviews of text mining or textual data mining in Japan from the practical aspects. Firstly, we explain that text mining is defined as a branch of data mining (in particular, KDD: knowledge discovery and data mining) which is applied to large amount of text datasets. And target of text mining is to objectively discover and extract knowledge, facts, and meaningful relationships from the text documents. We will also briefly outline the related disciplines and application fields which are applied in text mining.
In addition, we discuss the applicability of text mining in the field of qualitative research and also examine about how to solve some problems faced in using text mining techniques. Moreover, the computer programs for conducting text mining are given as the summarized tables.
As concrete examples, using a data set of some open-ended questions obtained by Web-based survey, we illustrate several analyses of segmentation of Japanese responses to the open-ended questions, visualization of mining results, and statistically significant test based on the frequencies of characteristic words and the corresponding statistical test-values obtained from the aggregated lexical table for “words by gender-age variable” with 12 categories, generated by cross-tabulating gender (two categories) and age (6 categories).
Finally, we propose a perspective of text mining that we expect, that is, about how to solve questions which knowledge is needed and how to be able to suitably gather the text data sets required for understanding the target phenomena. At any rate, from the point of view of data science, question about how a sort of “acquisition system” for obtaining the appropriate data sets can be integrated will have to be examined in future.
View full abstract