In reinforcement learning, environments with a sparse reward signal are significantly difficult to model. Especially, learning actions in 3D environment from the first person view is regarded as POMDP which potentially extends state space. Large environments with a sparse reward need efficient learning process in large state space. In this paper, we propose a deep reinforcement learning method with the memory module proposed in Neural Episodic Control, adding cognitive information to the memory module to improve performance.
Experiments on reinforcement learning were conducted on games on OpenAI Gym and robot simulators using "Proximal Policy Optimization Algorithms", which is considered to be suitable for motion learning of humanoid robots. As a result, it was confirmed that reinforcement learning is possible by the program of the algorithm published from OpenAI. Moreover, we confirmed that the operation on the robot simulator can be operated with real robot by the experimental experiment with real robot.
I propose the new frame-work of the cooperation between AGI research and IoT/AI developer for IoT echo-system combining the state-of-art technology such as SemanticWeb of Things, Machine Learning platform and cognition model referencing Good AI AGI roadmap. And also propose the monetization of the IoT echo-system.
This paper proposes a new method of time series prediction, using mulitiple deep learners and a Baysian network. We firstly suggests two approaches. The former is a method in which explanatory variables of inputs data are nodes of a Bayesian network and are associated with learners. On the other hand, the latter method is a method in which the outputs of all the learners are made to nodes of the Bayesian network and the outputs are integrated. In this paper, the former method will be proposed in detail. Training data is divided into some clusters with K-means clustering and the multiple deep learners are trained, depending on each clusters. A Bayesian network is used to determine which the deep learner is in charge of predicting a time series. Our proposed method is applied to financial time series data, and the predicted results for the return of Nikkei 225 is demonstrated.
Back propagation is widely used for deep learning, however, it requires white box cost functions that is formulated and differentiable. It is difficult for non-experts to build the model for the problem for which the effective cost function is not known. In this report, we propose the gradient estimation method with code-division multiplexing that can calculate gradients of weights in the neural network by using multiple forward propagations. The proposed method enables machine learning for the problem with black box cost functions that cannot be formulated but can calculate cost value. In this report, the proposed method is evaluated on the MNIST problem. Evaluation results shows the proposed method can build the model to recognize MNIST digits and the appropriate lengths of spreading code are small in starting phase and large in finishing phase in learning term.
A system that transits from the initial state to the target state is assumed. The process of state transition is represented by time series data. The time series data is not given to the system unlike a program of a computer, but acquired by trial and error. To combine and search time series data, the context structure inherent in time series data is used. For example, even if the details of the time series data leading to the target state at the time of searching can not be determined, the time series data immediately before reaching the target state and the time series data indicating the movement from the initial state are linked at the upper level of the context In other words, if there is an overlap in the tree structure, it becomes a search candidate. It has been announced that the hierarchical structure is inherent in the time series data and that the basic sequence making up the time series data can naturally correspond to the activation area in the neural network.
Artificial intelligence is expected as the next form of computer. In this paper theory of artificial intelligence is discussed. It is based on the foundation of mathematics and thus on the necessary and sufficient conditions of intelligence, ethics and safety.