Transactions of the Japanese Society for Artificial Intelligence

Original Paper

On Representing the Set of All Parse Trees with a Decision Diagram

Kei Amii, Masaaki Nishino, Akihiro Yamamoto

Article type: Original Paper
2019Volume 34Issue 6 Pages A-I34_1-12
Published: November 01, 2019
Released on J-STAGE: November 01, 2019

DOIhttps://doi.org/10.1527/tjsai.A-I34

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we construct decision diagrams (DDs) representing the set of all parse trees of a context-free grammar (CFG) from a sequence data and analyze DD size. CFG is widely used in the field of natural language processing and bioinformatics to estimate the hidden structures of sequence data. A decision diagram is a data structure that represents a Boolean function in a concise form. By using DDs to represent the set of all parse trees, we can efficiently perform many useful operations over the parse trees, such as finding parse trees that satisfy additional constraints and finding the most probable parse tree. Since the time complexity of these operations strongly depends on DD size, selecting an appropriate DD variant is important. Experiments on the parse trees of a simple CFG show that the Zero-suppressed Sentential Decision Diagram (ZSDD) is better than other DDs; we also give theoretical upper bounds on ZSDD size of a simple CFG. Moreover, we propose an efficient method based on CYK (Cocke-Younger-Kasami) algorithm to construct ZSDDs that can represent the set of all parse trees. Experiments show that the method can construct ZSDDs much faster than the naive method based on compiling a Boolean function.

View full abstract

Download PDF (973K)

Exploratory Research Paper

Estimating Consistent Reward of Expert in Multiple Dynamics via Linear Programming Inverse Reinforcement Learning

Yusuke Nakata, Sachiyo Arai

Article type: Exploratory Research Paper
2019Volume 34Issue 6 Pages B-J23_1-11
Published: November 01, 2019
Released on J-STAGE: November 01, 2019

DOIhttps://doi.org/10.1527/tjsai.B-J23

JOURNAL FREE ACCESS

Show abstractHide abstract

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments.

View full abstract

Download PDF (1915K)

Original Paper

Speech-to-Gesture Generation Using Bi-Directional LSTM Network

Naoshi Kaneko, Kenta Takeuchi, Dai Hasegawa, Shinichi Shirakawa, Hiros ...

Article type: Original Paper
2019Volume 34Issue 6 Pages C-J41_1-12
Published: November 01, 2019
Released on J-STAGE: November 01, 2019

DOIhttps://doi.org/10.1527/tjsai.C-J41

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a novel framework for automatic speech-driven natural gesture motion generation. The proposed method consists of two steps. First, based on Bi-Directional LSTM Network, our deep network learns speech-gesture relationships with both forward and backward consistencies for a long period of time. The network regresses full 3D skeletal pose of a human from perceptual features extracted from the input audio in each time step. Second, we apply combined temporal filters to smooth out generated pose sequences. We utilize a speech-gesture dataset recorded with a headset and a marker-based motion capture to train our network. We evaluate different acoustic features, network architectures, and temporal filters in order to validate the effectiveness of the proposed approach. We also conduct a subjective evaluation and compare our approach against real human gestures. The subjective evaluation result shows that our generated gestures are comparable to “original” human gestures and are significantly better than “mismatched” human gestures taken from a different utterance in the scale of naturalness.

View full abstract

Download PDF (2395K)

Register with J-STAGE for free!