Could Authors of Academic Reports be Discerned Using Formatting Information Obtained by Parsing XML of .docx Documents?

Asako Ohno

doi:10.1541/ieejeiss.143.91

抄録

Electronic documents are easier to copy, paste, or duplicate than handwritten reports. Consequently, plagiarism in class assignment reports is increasing. Existing plagiarism detection methods primarily calculate similarity based on matching characters or words in a document. However, class assignment reports are written simultaneously by multiple students on the same topic, and the teacher often specifies the format in detail, making the contents quite comparable. The risk of false-positive results is preventable if the teachers visually check whether matching parts of class assignment reports are coincidental or plagiarized. However, this is a time-consuming and labor-intensive task. Herein, we propose a method to discriminate authors using word-formatting information obtained by parsing Extensible Markup Language (XML) of word .docx documents as document creation features. We conducted an experiment using university class reports and visualized obtained classification rules that discriminate between the same author's writing using a decision tree. We also evaluated classification performance using random forests.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (327.9K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）