In this paper, we describe research on applied systems for realizing efficiency of work to store information of notice of annual meeting of shareholders in the database by using text mining technology. We aim to estimate start pages of proposals stated in notice of the meeting of shareholders and classify which proposal the page is. And we developed a system that automatically performs these tasks using text information of the notice of convocation of shareholders, and actually operates it. As a result of comparative experiment between our implemented system and conventional manual work, the working time was shortened to about 1/10. We propose three methods for classifying proposals. The first method classifies proposals by specialized terms extracted from training data. The second method classifies proposals by using deep learning. The final method classifies proposals by extracted proposal title. We evaluated our methods, and the effectiveness of each method was verified.
There is a growing need for automatic measurements of language ability. With increase in the aging population, the number of older adults with dementia is expected to increase. Language disorder is considered one of the most fundamental symptom with which dementia can be detected. The identification of language defects particular to dementia symptoms may contribute to the early detection of dementia. Currently, many foreign students learn Japanese at educational institutions, which are required to provide appropriate evaluations to their students. However, differences occur because evaluators judge the language abilities of foreign students subjectively. The measured results from an automatic language ability measurement system could be used for objective evaluations. Although traditional studies manually measured language abilities, such methods are often costly. In this study, we propose the automatic language ability measurement system, “KOTOBAKARI,” as an alternative to traditional manual measurement methods. This system is expected to reduce the cost and time of measuring the language ability using the following steps: (1) voice recognition to obtain text data and (2) quantitative indicators for automated language ability measurement. We verify the capacity of voice recognition to measure language ability by comparing language ability to a threshold value and by assessing language ability score changes. The experimental results show that it is difficult to judge language ability by threshold value comparison, using our system. On the contrary, the language ability score of some individuals, measured with our system, was correlated with the correct score. This result indicates that these individuals can use our system to assess language ability changes by continuous measurements.
Herein, we describe a system that classifies each sentence in reviews for the sellers of an online shopping website based on aspects, such as shipping and packaging, and sentiment polarity (positive and negative). First, we investigated 487 sentences that were extracted from randomly selected 100 seller reviews for revealing the aspects mentioned in the reviews. This was done because the aspects in the seller reviews are not obvious. Consequently, we found that 14 aspects were described in the seller reviews. We annotate 14 aspects and their sentiment polarity for 5,277 sentences in 1,510 seller reviews that are newly and randomly chosen. Then, we train the classification models using the existing machine learning software. Through the system based on these classification models, users can understand the trend for any aspect in a time series and easily access reviews describing aspects which they are interested in.
NHK launched a simplified Japanese news service on the Internet (NEWS WEB EASY) in April 2012 to improve news accessibility for foreign residents in Japan. Simplified news scripts are compiled daily by a Japanese language instructor and a news reporter working together using a custom news rewrite support system comprising a rewrite support editor and a reading support information editor. The rewrite support editor identifies difficult expressions and proposes simpler alternatives with similar expression search function, and the reading support information editor helps the Japanese language instructor add auxiliary information such as furigana and glossary information to the simplified news scripts. Both functions feature learning mechanisms that improve the precision of news simplification. This news rewrite support system has been in daily use since 2012. A questionnaire evaluation administered to the Japanese instructors and reporters who produce NEWS WEB EASY showed that they highly evaluated both editors. Analysis of the rewrite support editor’s log revealed that the difficult word rate and sentence length in the rewritten news were less than in the original and were similar across rewriters.
Sufficient data is required for research on advanced AI. In the field of medicine, especially clinical medicine, information retrieval is necessary to utilize the data fully since the data—mainly clinical records—uses natural language. The corpus we developed in this study has the following strong points: (i) The corpus consists of 45,000 case reports, which is the largest to our knowledge, and (ii) not only did we standardized the terminology and the method for annotation, we also annotated “factness,” which notes whether or not a disease name is actually the state of the patient in a case report. This paper describes the methods to develop the medical corpus for AI research, focusing on the annotation of the disease or symptom name. First, we define the concepts contained in the annotation criteria using examples. Next, we discuss the feasibility of the annotation through giving some indexes such as agreement rate. Finally, we report the development of the disease-name extraction system based on the corpus. We believe this corpus is a good reference for future clinical annotation.