Current methods for teaching about "force and motion" depend on the use of equations and do not place emphasis on adequately supporting an understanding based on causality. One possible reason for this is the lack of a causality-compliant theory that gives a consistent treatment of the problem of action and reaction, the problem of apparent forces like centrifugal force, and so on. By adopting a naive view of causality (causality-based understanding) that agrees with human experience, we constructed a causal theory of force and motion. This theory can serve as the foundation for an educational approach in helping junior high and high school students understand and explain various phenomena related to forces and motion. Using this approach, it will be possible to design and develop educational support methods and systems expected to reduce student misunderstandings. It will also afford the creation of a general-purpose motion simulator with the ability to provide automated causal explanations of physical phenomena.
Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.