On some application systems utilizing combination of Natural Language Processing (NLP), Thesaurus, and/or Ontology, there are many problems on processing knowledge base of the systems. Text mining, Spoken Dialog System, or Document Classifier can be enumerated as examples of such applications. On processing of those applications, steps of technologies are utilized in same time as a flow of analytics. In that time, some words can be found in Ontology, but not in NLP. This symptom causes processing failure. The purpose of our research is reducing the error rate of the Multi Steps Natural Language Processing. From our investigation, about 60% of nouns cannot be found on WordNet and about 70% cannot be found on DBPedia even if it is extracted in latest NLP when tested using BTSJ dialog corpus data. On the other hand, 260 combined words on WordNet and more than 1,300 combined words on DBPedia can be found even if NLP cannot extract them as nouns. Reducing these differences between processes is important to improve accuracy of language processing. This paper proposes creating a framework to integrate dictionary data for each processor, effectiveness, and its possibility of implementation.
Linked Open Data (LOD) datasets and data catalog sites to curate them are proportionally increasing in number. Even though such data catalog sites provide metadata for different LOD datasets, provisions for analyzing the relationship between them are limited. To address this challenge, the authors investigated the property usage trends, based on genre and creation date for exploring relationships between different LOD datasets.
Knowledge database is utilized to promote understanding technical documents.Various types of knowledge is required according to the situation of the user.We propose the system extracts user-adaptive knowledge graph from knowledge database.User-adaptive knowledge graph includes effective knowledge for understanding the technical documents as the user.Our system includes ontology about relations between attributes of usecases and rules of graph extraction.In the situation of reading a paper about steel material, our experimental result shows that out system extracted proper range of knowledge graph depending on the user situation.
The Knowledge Graph systematically links knowledge and constructs a semantic network to represent the knowledge domain. Knowledge graphs enable data integration, knowledge discovery and advanced analyses. We have constructed graphical knowledge graphs and provided related services focusing on agriculture activity and crop. This paper discusses not only the process of constructing the knowledge graph in agriculture but also the process of constructing the domain knowledge graph and the points to be noted.
In recent years, there has been an increasing interest in numerical semantic labeling, in which the meaning of an unknown numerical attribute is assigned by the label of the most relevant attributes in predefined knowledge bases. Previous methods used the p-value in statistical hypothesis testing to estimate the relevance and thus strongly depend on the distribution and type of data domain. In other words, the p-value based similarity is unstable for general cases, where such knowledge is undefined. In this paper, we first point out the p-value based similarity limitations. Second, we proposed the Distribution-Based Similarities where the similarities are derived from the norms of the inverse transform sampling of attribute distributions. Our experiments on City Data and Open Data show that the Distribution-Based Similarities outperforms other the p-value based approaches in the task of semantic labeling for numerical values.
Descriptive metadata, mainly subject is used to provide search within the vast collections of Manga on the web. Manga subject vocabularies are created to improve the metadata used for subject-based searches. However, considering the large number and diversity of subjects covered, developing such a subject vocabulary for Manga is a tedious process. This research proposes a method to enhance subject vocabulary by expanding the subject headings to improve the inclusiveness. The proposed method initially links existing vocabulary with the metadata obtained from major e-book providers, and then extend subject headings with theme-related words extracted from other web resources.