This paper proposes a novel sentence final tone labels estimation method using dialogue-act (DA) informationfor text-to-speech synthesis within a spoken dialogue system. Estimating appropriate sentence final tone labels isconsidered essential for communicating the exact system’s intentions to users by an utterance. In this paper, wepropose to utilize DA features as well as the conventional features, morphological information of the utterance text, toestimate the sentence final tone labels. For this study, we use the speech database with DA tags which we constructedin our previous study. We added sentence final tone labels to this database so that each utterance has all informationof utterance text, DA and sentence final tone label. Based on this database, we build the proposed sentence final toneestimation model. We evaluated the proposed method by comparing its performance with the conventional method.The evaluation results show that the proposed method overwhelms the conventional method in accuracy. We alsoanalyze the estimation results to investigate the efficacy and difficulty by the proposed method.
It has been reported that a person’s remarks and behaviors reflect the person’s personality. Several recent studies have shown that textual information of user posts and user behaviors such as liking and reblogging the specific posts are useful for predicting the personality of Social Networking Service (SNS) users. However, less attention has been paid to the textual information derived from the user behaviors. In this paper, we investigate the effect of using textual information with user behaviors for personality prediction. We focus on the personality diagnosis website and make a large dataset on SNS users and their personalities by collecting users who posted the personality diagnosis on Twitter. Using this dataset, we work on personality prediction as a set of binary classification tasks. Our experiments on the personality prediction of Twitter users show that the textual information of user behaviors is more useful than the co-occurrence information of the user behaviors and the performance of prediction is strongly affected by the number of the user behaviors, which were incorporated into the prediction. We also show that user behavior information is crucial for predicting the personality of users who do not post frequently.
Large-scale ontologies are attracting attention in various fields such as information retrieval, data integration and question answering. Because it takes many costs for human experts to build and maintain ontologies, much attention has been come to the work on ontology learning from semi-structured information resources, such as Wikipedia. YAGO and DBpedia are famous examples of such researches aimed at (semi) automatic construction fromWikipedia. However, in Wikipedia, there are still many information that YAGO and DBpedia can not extract, such as texual definition, headings, list articles and list structures. Also, YAGO mainly extracts class hierarchies and class instance relations and does not extract property types such as symmetric, transitive, functional and inverse functional. DBpedia has property domains (rdfs:domain), property ranges (rdfs:range), and some property types. However, they are described manually, which causes problems in terms of maintenance and update. In this paper, we present construction methods of ontology which complement YAGO and DBpedia. We use Wikipedia categories, Infobox, Infobox templates, list articles, texual definition, headings and list structures as Wikipedia resources. The ontologies include Is-a relations (rdfs:subClassOf), class instance relations (rdf:type) and triple. We compare them with overseas Wikipedia ontologies such as YAGO and DBpedia. Also we show that we can extract relations which YAGO and DBpedia can not extract.
Large-scale ontologies have received much attention in various fields such as information retrieval, data integration, and question answering. However, since it is costly to construct and maintain large-scale ontologies manually, several ontology learning methods fromWikipedia have been proposed. YAGO and DBpedia are famous examples of large-scale ontologies automatically constructed from multilingualWikipedia. However, there is room to extract from Wikipedia many kinds of relationships that are not included in YAGO or DBpedia. Previously, we proposed a method for constructing large-scale ontologies named Japanese Wikipedia Ontology (JWO) from Japanese Wikipedia. Since JWO has been constructed using several methods that are different from the extracting methods of YAGO and DBpedia, it is possible to extract relationships that do not exist in YAGO or DBpedia by using these methods. However, some of the methods are Japanese language-dependent, and cannot be applied directly to English Wikipedia. In this paper, we propose methods for constructing large-scale ontologies named English Wikipedia Ontology (EWO) from English Wikipedia. To achieve this, we have modified the Japanese language-dependent methods of JWO and extended several methods for extracting property axioms. EWO includes Is-a relations (rdfs:subClassOf), classinstance relations (rdf:type), property relations and types, and synonyms (owl:sameAs). The property relations are triples: property domain (rdfs:domain), property range (rdfs:range), and property hypernymy-hyponymy relations (rdfs:subPropertyOf). The property types are object (owl:ObjectProperty), data (owl:DatatypeProperty), symmetric (owl:SymmetricProperty), transitive (owl:TransitiveProperty), functional (owl:FunctionalProperty), inverse functional (owl:InverseFunctionalProperty), asymmetric (owl:AsymmetricProperty), reflexive (owl:ReflexiveProperty) and irreflexive (owl:IrreflexiveProperty). To evaluate EWO, we compared it with existing ontologies constructed from English Wikipedia. Thus, we clarified the differences between the existing ontologies and EWO.
We propose a method to assist legislative drafters that locates inappropriate legal terms in Japanese statutorysentences and suggests corrections. We focus on sets of mistakable legal terms whose usages are defined in legislationdrafting rules. Our method predicts suitable legal terms using a classifier based on BERT (Bidirectional EncoderRepresentations from Transformers). The BERT classifier is pretrained with a huge number of whole sentences; thus,it contains abundant linguistic knowledge. Classifiers for predicting legal terms suffer from two-level infrequency:term-level infrequency and set-level infrequency. The former causes a class imbalance problem and the latter causesan underfitting problem; both degrade classification performance. To overcome these problems, we apply threetechniques, namely, preliminary domain adaptation, repetitive soft undersampling, and classifier unification. Thepreliminary domain adaptation improves overall performance by providing prior knowledge of statutory sentences,the repetitive soft undersampling overcomes term-level infrequency, and the classifier unification overcomes set-levelinfrequency while saving storage consumption. Our experiments show that our classifier outperforms conventionalclassifiers using Random Forest or language models, and that all three training techniques improve performance.
The spread of COVID-19, the so-called new coronavirus, is currently having an enormous social and economic impact on the entire world. Under such a circumstance, the spread of information about the new coronavirus on SNS is having a significant impact on economic losses and social decision-making. In this study, we investigated how the new type of coronavirus has become a social topic in Japan, and how it has been discussed. In order to determine what kind of impact it had on people, we collected and analyzed Japanese tweets containing words related to the new corona on Twitter. First, we analyzed the bias of users who tweeted. As a result, it is clear that the bias of users who tweeted about the new coronavirus almost disappeared after February 28, 2020, when the new coronavirus landed in Japan and a state of emergency was declared in Hokkaido, and the new corona became a popular topic. Second, we analyzed the emotional words included in tweets to analyze how people feel about the new coronavirus. The results show that the occurrence of a particular social event can change the emotions expressed on social media.