2023 年 143 巻 1 号 p. 91-100
Electronic documents are easier to copy, paste, or duplicate than handwritten reports. Consequently, plagiarism in class assignment reports is increasing. Existing plagiarism detection methods primarily calculate similarity based on matching characters or words in a document. However, class assignment reports are written simultaneously by multiple students on the same topic, and the teacher often specifies the format in detail, making the contents quite comparable. The risk of false-positive results is preventable if the teachers visually check whether matching parts of class assignment reports are coincidental or plagiarized. However, this is a time-consuming and labor-intensive task. Herein, we propose a method to discriminate authors using word-formatting information obtained by parsing Extensible Markup Language (XML) of word .docx documents as document creation features. We conducted an experiment using university class reports and visualized obtained classification rules that discriminate between the same author's writing using a decision tree. We also evaluated classification performance using random forests.
J-STAGEがリニューアルされました! https://www.jstage.jst.go.jp/browse/-char/ja/