Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 2N1-GS-4-05
Conference information

Classification of author affiliations extracted from scholarly PDF documents
*Kazuhiro YAMAUCHIMarie KATSURAI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The affiliation information of authors in academic papers plays a crucial role in various analyses in scientometrics. To obtain author affiliation information from academic papers, many previous studies have relied on publisher databases or open databases as sources of information. However, these databases do not necessarily store the author affiliation information of the analysis target as metadata. This can result in a decrease in analysis coverage. Extracting affiliation information from raw PDF files could be a solution to solve this problem. In this study, we propose a method to extract strings directly related to the affiliation information of authors from academic paper PDFs and classify whether the research institution belongs to academia or industry. Our results demonstrate a successful classification rate of approximately 90% for research institutions. In practical applications, our proposed method reduced manual classification by about 63%.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top