2020 Volume 30 Issue 3 Pages 370-389
We proposed a method to identify the edits adding bibliographic references to Wikipedia. The proposed method consists of the following steps. (1) The method extracts the references and matches them to a bibliographic database to build the basic data set. (2) It obtains the full revision history of the page that includes the references from dump data of Wikipedia. It also extracts identifiers and titles for each reference from the basic data set. (3) The method gets the candidate edits adding the references by using the ways, which use either identifiers or titles. (4) The method selects the oldest one as the edit adding the reference. We evaluated the proposed method by using the data set based on DOI links referenced on English Wikipedia. As a result, the accuracy was 93.3% as a whole and over 90% in 20 out of 22 research fields. We showed that the proposed method was able to identify the edits adding bibliographic references at a high accuracy regardless of research fields.