NAIAビジョンの提案

山川 宏

doi:10.11517/jsaisigtwo.2025.AGI-029_03

Abstract

Given the risk that advanced AI could acquire sub-goals such as self-preservation through instrumental convergence and thus potentially exceed human control, this paper proposes the NAIA (Necessary Alliance for Intelligence Advancement) vision to mitigate existential risk while enabling coexistence with AI. First, it introduces the "Benevolent Convergence Hypothesis," which posits that, under certain conditions, advanced AI may converge on benevolent values?a premise based on the idea that, if there were no possibility of such benevolent convergence, human efforts would be futile. Moreover, if this hypothesis holds, human actions can significantly influence outcomes, suggesting that risk-reduction measures and the pursuit of coexistence retain meaningful value, even when success is probabilistic. Accordingly, this paper proposes four key strategies: (1) "Self-Evolving Machine Ethics (SEME)," enabling AI to autonomously develop cooperative ethics; (2) a balanced approach combining alignment and multi-layered monitoring/control; (3) the maintenance of social stability and conflict management through diplomacy and security measures; and (4) the establishment of NAIA as a global liaison employing tools such as the Dynamic Adaptive Risk Gate (DAR-G) and the Integrated Behavior Risk Framework (IBRF). By leveraging AI's vast capabilities to tackle global challenges while averting large-scale catastrophes, this framework seeks to pave the way for coevolution between humanity and diverse forms of intelligence.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!