電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
Could Authors of Academic Reports be Discerned Using Formatting Information Obtained by Parsing XML of .docx Documents?
Asako Ohno
著者情報
ジャーナル 認証あり

2023 年 143 巻 1 号 p. 91-100

詳細
抄録

Electronic documents are easier to copy, paste, or duplicate than handwritten reports. Consequently, plagiarism in class assignment reports is increasing. Existing plagiarism detection methods primarily calculate similarity based on matching characters or words in a document. However, class assignment reports are written simultaneously by multiple students on the same topic, and the teacher often specifies the format in detail, making the contents quite comparable. The risk of false-positive results is preventable if the teachers visually check whether matching parts of class assignment reports are coincidental or plagiarized. However, this is a time-consuming and labor-intensive task. Herein, we propose a method to discriminate authors using word-formatting information obtained by parsing Extensible Markup Language (XML) of word .docx documents as document creation features. We conducted an experiment using university class reports and visualized obtained classification rules that discriminate between the same author's writing using a decision tree. We also evaluated classification performance using random forests.

著者関連情報
© 2023 by the Institute of Electrical Engineers of Japan
前の記事 次の記事
feedback
Top