2008 Volume 2008 Issue DMSM-A802 Pages 14-
Here we propose a method for specifying the characteristics of offenders from a body of records on such suspicious persons. The method comprises two steps: the generation of term-document matrices by analyzing records of the offenders' characteristics, and classifying the records on the basis of these characteristics. Since the descriptions comprise Japanese free text, we adopt ChaSen, a morphological analysis system, as a preprocessor for generation term-document matrices. We use a k-means clustering program supported by "MUSASHI" a set of data processing and mining commands. After clustering, we use TF-IDF to assign these groups distinguishable labels. Our mehtod--the combination of morphological analysis and clustering--automatically produces descriptions of repeat offenders and may be useful in the fight against crime.