This paper aims at improving naturalness of synthesized speech generated by a text-to-speech (TTS) systemwithin a spoken dialogue system with respect to “how natural the system’s intention is perceived via the synthesizedspeech”. We call this measure “illocutionary act naturalness” in this paper. To achieve this aim, we propose toutilize dialogue-act (DA) information as an auxiliary feature for a deep neural network (DNN)-based speech synthesissystem. First, we construct a speech database with DA tags. Second, we build the proposed DNN-based speechsynthesis system based on the database. Then, we evaluate the proposed method by comparing its performance withtwo conventional hidden Markov model (HMM)-based speech synthesis systems, namely, the style-mixed modelingmethod and the style adaptation method. The objective evaluation results show that the proposed method overwhelmsthe style-mixed modeling method in the accuracy of reproduction of global prosodic characteristics of dialogue-acts.They also reveal that the proposed method overwhelms the style adaptation method in the accuracy of reproduction of sentence final tone characteristics of dialogue-acts. The subjective evaluation results also show that the proposed method improves the illocutionary act naturalness compared with the two conventional methods.
A game AI general theory has been researched and developed in game industry in the world, and a new game AI general theory of AI in digital game is supposed with three AIs. For a large scale of game, a game AI system consists of three types of AI such as meta-AI, character AI, and navigation AI. A meta-AI is to control a game dynamically from a bird-view by watching a player’s behavior. A character AI is a brain of a game character such as a buddy, a monster, or a villager to make a decision in real-time. A navigation AI is to recognize an environment of a game to find a path or a best location to move dynamically. Especially, character AI is a main topic to study in game development, and it includes many fields such as multi-layered structure, character animation, agent architecture, decision-making modules, and so on. A new method of decision-making of combination of behavior trees and state machines is supposed. It is called AI Graph. The game AI general theory was applied to an AI system of an action-RPG game “FINAL FANTASY XV”. The results are showed in the paper. All characters’ decision-making system in FINAL FANTASY XV are based on AI Graphs. An AI Graph Editor is a tool to make an AI Graph only by using a mouse and simple text inputs. A dynamics of the new method is showed by explaining AI Graph Editor precisely.
Most characters in the Chinese and Japanese languages are ideographic compound characters composedof subcharacter elements arranged in a planar manner. Token-based and image-based subcharacter-level modelshave been proposed to leverage the subcharacter elements. However, on the one hand, the conventional tokenbasedsubcharacter-level models are blind to the planar structural information; on the other hand, the image-basedmodels are weak with respect to ideographic characters that share similar shapes but have different meanings.These characteristics motivate us to explore non-image-based methods to encode the planar structural informationof characters. In this paper, we propose and discuss a method to encode the planar structural information by learningembeddings of the categories of the structure types. Our proposed model adds the structure embeddings to theconventional subcharacter embeddings and position embeddings before they are input into the encoder. In this way,the model learns the planar structural and positional information and retains the uniqueness of each character. Weevaluated the method in a text classification task. In the experiment, the embeddings were encoded by a CNN encoder,and then the encoded vectors were input into an LSTM classifier to classify product reviews as positive or negative.We compared the proposed model with models that use only the subcharacter embeddings, the structure embeddingor the position embeddings as well as with the conventional models in previous works. The results show that addingboth structure embeddings and position embeddings leads to more rich and representative features and better fittingon the dataset. The proposed method results in at best 1:8% better recall, 0:63% better F-score, and 0:55% better accuracy on the testing datasets compared to previous methods.
In places where many people gather, such as large-scale event venues, it is important to prevent crowd accidentsfrom occurring. To that end, we must predict the flows of people and develop remedies before congestioncreates a problem. Predicting the movement of a crowd is possible by using a multi-agent simulator, and highly accurateprediction can be achieved by reusing past event information to accurately estimate the simulation parameters.However, no such information is available for newly constructed event venues. Therefore, we propose here a methodthat improves estimation accuracy by utilizing the data measured on the current day. We introduce a people-flowprediction system that incorporates the proposed method. In this paper, we introduce results of an experiment on thedeveloped system that used people flow data measured at an actual concert event.