Although attention-based Neural Machine Translation have achieved great success, attention-mechanism cannot capture the entire meaning of the source sentence because the attention mechanism generates a target word depending heavily on the relevant parts of the source sentence. The report of earlier studies has introduced a latent variable to capture the entire meaning of sentence and achieved improvement on attention-based Neural Machine Translation. We follow this approach and we believe that the capturing meaning of sentence benefits from image information because human beings understand the meaning of language not only from textual information but also from perceptual information such as that gained from vision. As described herein, we propose a neural machine translation model that introduces a continuous latent variable containing an underlying semantic extracted from texts and images. Experiments conducted with an English–German translation task show that our model outperforms over the baseline in METEOR score.
Although generative adversarial networks (GANs) have achieved state-of-the-art results in generating realistic looking images, models often consist of neural networks with few layers compared to those for classification. We evaluate different architectures for GANs with varying depths using residual blocks with shortcut connections in order to train GANs with higher capacity. While training tend to oscillate and not benefit from additional capacity of naively stacked layers, we show that GANs are capable of generating images of higher visual fidelity with proper regularization and simple techniques such as minibatch discrimination. In particular, we show that an architecture similar to the standard GAN with residual blocks in the hidden layers consistently achieve higher inception scores than the standard model without noticeable mode collapse. The source code is made available on https://github.com/hvy/gan-complexity.
We propose a dialogue management model based on a set of specific conversational strategies, namely communication strategies and "affective backchannels" in order to foster embodied conversational agents’ ability to carry on conversations that are effective in enhancing learners’ willingness to communicate in English.
In order to prevent declining cognitive functions, coimagination method is developed. This method is to train the episodic memory function of participants by carrying out group conversation based on photos, with designated time limit and theme. We study the mental time of participants in order to determine the memory function utilized by participants during conversation. Mental time is a time consciousness of human over past, present and future. And with the purpose of training the recent episodic memory function, mental time travelling to the “Past” for a long time is undesirable. In this study, we propose the method to classify mental time into “Past”, “Present” and “Future” by using three temporal elements which are speech, event and photographing time. Speech time is the time of coimagination method was carried out or the talking time, while event time is the time related to the contents of utterances and photographing time is the time of photo was taken. The relation of these three temporal information is connected with “before”, “after” and “at the same time”. By using the relation of three temporal elements, we classify the mental time of participants and plot the graph of mental time travelling.