In this paper, we construct decision diagrams (DDs) representing the set of all parse trees of a context-free grammar (CFG) from a sequence data and analyze DD size. CFG is widely used in the field of natural language processing and bioinformatics to estimate the hidden structures of sequence data. A decision diagram is a data structure that represents a Boolean function in a concise form. By using DDs to represent the set of all parse trees, we can efficiently perform many useful operations over the parse trees, such as finding parse trees that satisfy additional constraints and finding the most probable parse tree. Since the time complexity of these operations strongly depends on DD size, selecting an appropriate DD variant is important. Experiments on the parse trees of a simple CFG show that the Zero-suppressed Sentential Decision Diagram (ZSDD) is better than other DDs; we also give theoretical upper bounds on ZSDD size of a simple CFG. Moreover, we propose an efficient method based on CYK (Cocke-Younger-Kasami) algorithm to construct ZSDDs that can represent the set of all parse trees. Experiments show that the method can construct ZSDDs much faster than the naive method based on compiling a Boolean function.
Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments.
We present a novel framework for automatic speech-driven natural gesture motion generation. The proposed method consists of two steps. First, based on Bi-Directional LSTM Network, our deep network learns speech-gesture relationships with both forward and backward consistencies for a long period of time. The network regresses full 3D skeletal pose of a human from perceptual features extracted from the input audio in each time step. Second, we apply combined temporal filters to smooth out generated pose sequences. We utilize a speech-gesture dataset recorded with a headset and a marker-based motion capture to train our network. We evaluate different acoustic features, network architectures, and temporal filters in order to validate the effectiveness of the proposed approach. We also conduct a subjective evaluation and compare our approach against real human gestures. The subjective evaluation result shows that our generated gestures are comparable to “original” human gestures and are significantly better than “mismatched” human gestures taken from a different utterance in the scale of naturalness.