2008 年 2008 巻 DMSM-A802 号 p. 14-
Here we propose a method for specifying the characteristics of offenders from a body of records on such suspicious persons. The method comprises two steps: the generation of term-document matrices by analyzing records of the offenders' characteristics, and classifying the records on the basis of these characteristics. Since the descriptions comprise Japanese free text, we adopt ChaSen, a morphological analysis system, as a preprocessor for generation term-document matrices. We use a k-means clustering program supported by "MUSASHI" a set of data processing and mining commands. After clustering, we use TF-IDF to assign these groups distinguishable labels. Our mehtod--the combination of morphological analysis and clustering--automatically produces descriptions of repeat offenders and may be useful in the fight against crime.