Joho Chishiki Gakkaishi
Online ISSN : 1881-7661
Print ISSN : 0917-1436
ISSN-L : 0917-1436
Research Paper
Estimating an author’s gender using a random forest for offender profiling
Wataru ZAITSUMingzhe JIN
Author information
JOURNAL FREE ACCESS

2017 Volume 27 Issue 3 Pages 261-274

Details
Abstract

 Offender profiling is a method used to assist criminal investigation teams by estimating an offender’s gender, age, or job, on the basis of analyzing the crime scene using statistical and psychological methods. If only printed documents or e-mails are available, however, analysts are powerless to estimate the offenders’ characteristics until now, because there is no crime scene. This study aims to estimate gender by applying a random forest technique to texts on Blog. The results indicated that the following stylometric features were effective in estimating gender: rate of usage of Kanji, Hiragana, Katakana, nouns. Moreover, the frequency of certain parts of speech (verb, adjective, postpositional particle, and interjection), conjunctive particle 「し」, auxiliary verb 「なかっ」, comma, and letters (「私」「僕」「っ」「ゃ」) also were effective. The results of Leave-One-Out-Cross-Validation (LOOCV) showed that the highest rate of accuracy was 86.0%: 84.6% for male and 87.5% for female in the rate of precision. Furthermore, support vector machine showed lower accuracy, 75.0%, comparing with random forest: 69.2% for male and 85.7% for female in the rate of precision

Content from these authors
© 2017 Japan Society of Information and Knowledge
Previous article Next article
feedback
Top