本人工頭脳は人の脳の振舞いを模倣したものであって、真似ではなく、良いとこ取りである。人工頭脳が、自分の意思で情報を収集し、考え(意志の生成)、Action(行動、発話、さらなる思考)するという観点では汎用AIを目指している。但し、ディープラーニング(数値計算モデル)ファーストではなく、人の脳の振舞いを観察、極力、模倣し、究極の目標は「鉄腕アトムの頭脳」を創ることにあ る。[ 脳の振舞いの模倣事例 ] 1人は視野がAIと違って狭いため、全体を把握しつつ、注目した対象にフォーカスして行動する。このため、見えない、見ていないものまで連想/階層記憶で見てしまう。更に、視野情報以外に音声(会話)/音源/文字情報、手足の感触情報までイメージ情報(見た結果に変換)として取り込む。2外部情報(視野/音源/音声/感触)、内部情報に対し意思、記憶情報が活性化すると、自らの意思の意向に沿って、課題を生成、解決手段を見出し、具体的実行計画に落とし込む。「(視野情報)洗濯物がベランダに干してあり、雨が降り出した。(経験-連想記憶)洗濯物がずぶ濡れになる。(意思-危機感覚)大変なことになってしまう。」(課題)と考え、「家の中に取り込む。」(解決手段-人の意志)これは、雨によって危機感覚(このままでは大変なことになる)が活性化したからである。人工頭脳は、脳の振舞い(有機的情報処理)を観察・模倣し、機能分割された機能群が互いに連携しあい、全体で上手く機能する様にすると共に、上記情報処理を標準化(OS化)し、汎用のヒューマノイドロボットに搭載することで、「人と共存し、倫理・危機感覚/常識の範囲内で自発的な手助け、命令服従できる人工頭脳」を提供することが目標である。
In reinforcement learning, it is difficult to solve high-dimensional multi-objective decision problem. In this paper, we propose a novel hierarchical architecture of deep reinforcement learning with Accumulator Based Arbitration Mode(ABAM). This architecture has some groups which have deep Q-network (DQN) modules respectively and layers containing these modules to solve high-dimensional multi-objective decision problem. And ABAM arbitrates the outputs of modules in one layer by using the outputs of modules in the above layer which has more abstractive objective. By using this architecture, it is able to solve difficult maze task which simple DQN cannot solve.
Dennett cites three stances of physical, design, and intention as a stance for people to understand the object. In interaction with a robot, it has been claimed that smooth communication is performed when a person attributes an intentional stance to a robot. In this research, we made a hypothesis that showing the experiences of forming the behavioral principles of robots contributes to the attribution of the user's intentional stance, and verified it through experiments.
The speaker is developing AGI agents based on the assumption that mature AGI can be achieved by the combination of "the most general AI apart from the efficiency" for generality and "incremental learning" for learning to specialize for efficiency. He applied his AGI agent to Round 1 of General AI Challenge held in 2017, and received the joint 2nd place of the qualitative prize. In this talk, he will explain how to implement the AGI agent, the devices and difficulties when applying it to the Challenge Round 1, and some thought on the General AI Challenge series.
Humans can set suitable subgoals in order to achieve some purposes, and furthermore, can set sub-subgoals recursively if needed. It seems that the depth of the recursion is unlimited. Inspired by this behavior, we have designed a new hierarchical reinforcement learning architecture, the RGoal architecture. The algorithm is designed to solve the MDP on the augmented state- action space. The action-value function becomes shareable among multi-tasks due to the value function decomposition. The sharing accelerates learning in multi-task setting. The mechanism named "think-mode" is a kind of model-based reinforcement learning. It combines learned simple tasks in order to solve inexperienced complicated tasks quickly, or in zero-shot in some cases. The algorithm is realized by a flat table and repetition of simple operations, without a stack. Hereafter, we will extend this architecture, and will build the model of the information processing mechanism of the prefrontal cortex in the brain.
Bayesian network is a promising model of cerebral cortex. However, in ordinary Bayesian networks, the number of parameters increases exponentially against the number of parent nodes in each conditional probability table (CPT), which prohibits employing a large-scale Bayesian network. Restricting CPTs is an approach for scaling-up Bayesian networks. In this paper, we restrict CPTs to noisy-OR and noisy-AND gates, both of which have O(N) parameters against the number of parent nodes N. In order to investigate the representational power of this Bayesian network, we construct a network which can pool the features of the input data, by mimicking the early visual cortex. In this model, a gating mechanism is realized by the noisy-AND gates. We show that the model can acquire translation-invariant responses by the standard gradient ascent method.
Reinforcement learning using deep learning and Monte Carlo tree search has been reported to be extremely effective as an artificial intelligence algorithm that is used in AlphaZero etc. and is widely applicable to various games. Since this method is essentially an algorithm that solves the search problem efficiently, it is possible to solve a general combination optimization problem as well as a game. Therefore, in order to deepen the understanding of this method, experiments were applied to combinatorial optimization problem, and the results are reported. The relationship between this method and the frame problem also be described.