Recently, ontology attracts attention as a fundamental technology for knowledge system, and many ontologies have been developed in various domains such as bioinformatics, medicine, engineering and so on. The authors also are involved with ontology developments in several domains. This article discusses some way of thinking to develop well organized ontology thorough the experiences of ontology developments.
Terms are used to describe important research concepts in academic documents, and are important to utilize the information in various research fields. In this paper, the author discuss about a method for extracting terms from academic texts based on natural language processing technique. Most of Japanese terms take composite word form, yet simple methods to extract composite terms based on current Japanese morpheme classification cannot attain enough precision. Considering internal structure of composite term candidates and the backward/ forward connective relations of the candidates in the texts, most of composite terms can be extracted with high precision. The author also discuss about the systematization of term candidates based on the nesting relations and the relationships of the candidates to various research sub-domains.
We developed a thesaurus of 440,000 terms for the purpose of natural language processing such as parsing or the term standardization. Because each entry term has a large number of terms with various semantic relations, we introduce a facet and classify them for finding relative terms easily. Furthermore, we distinguish discriminatory terms, and fluctuating Japanese spellings. Our package has the connecting function with the Internet and the other dictionaries. We described points to keep in mind and future tasks in making a thesaurus.