Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
The affiliation information of authors in academic papers plays a crucial role in various analyses in scientometrics. To obtain author affiliation information from academic papers, many previous studies have relied on publisher databases or open databases as sources of information. However, these databases do not necessarily store the author affiliation information of the analysis target as metadata. This can result in a decrease in analysis coverage. Extracting affiliation information from raw PDF files could be a solution to solve this problem. In this study, we propose a method to extract strings directly related to the affiliation information of authors from academic paper PDFs and classify whether the research institution belongs to academia or industry. Our results demonstrate a successful classification rate of approximately 90% for research institutions. In practical applications, our proposed method reduced manual classification by about 63%.