Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
NaDev: An Annotated Corpus to Support Information Extraction from Research Papers on Nanocrystal Devices
Thaer M. DiebMasaharu YoshiokaShinjiro Hara
Author information

2016 Volume 24 Issue 3 Pages 554-564


The process of nanocrystal device development is not well systematized. To support this process, analysis of the information produced by developmental experiments is required. In this study, we constructed an annotated corpus to support the extraction of experimental information from relevant publications. We designed the corpus-construction guidelines by cooperating with a domain expert. We evaluated these guidelines through corpus-construction experiments with graduate students from this domain, and then evaluated the corpus with the domain expert. In the corpus construction experiments, we achieved a sufficient level of Inter-Annotator Agreement by using a loose agreement measure that ignored the term-boundary mismatch problem, and made an agreement corpus that excluded annotations based on misunderstanding the guidelines. The domain expert evaluated this agreement corpus and modified the guidelines based on real examples. Using these guidelines, we finalized the corpus called “NaDev” (Nanocrystal Device development corpus). The NaDev corpus and its construction guidelines will be released via our website, The NaDev corpus aims to support automatic information extraction from publications relevant to nanocrystal device development. This information can be used to solve problems in the nanotechnology domain using the massive availability of fresh information. To the best of our knowledge, this is the first corpus constructed for the development of nanocrystal devices.

Content from these authors
© 2016 by the Information Processing Society of Japan
Previous article Next article