NICTにおける大規模言語モデルの研究開発

大竹 清敬

doi:10.11532/jsceiii.6.1_41

Abstract

At NICT, research and development of large language models (LLMs) and the systems that utilize them are underway. Pre-training data that were used by LLMs developed overseas contain only a small proportion of Japanese data, and the widespread adoption of these LLMs increses risks such as the loss of Japan’s cultural uniqueness. Therefore, enhancing the development capabilities of domestic LLMs that utilize a large amount of Japanese data becomes crucial. For training Japanese LLMs, high-quality Japanese data is necessary. Over the past 15years, NICT has accumulated a large quantity of Japanese web data, and this is being utilized to construct high-quality training data. The countermeasures against hallucinations, fake news, and other proposed risks are being developed. We are currently developing “WISDOM-LLM”, a platform that combines a diverse range of AIs, which can fact-check generated outputs from LLMs as well as generate well-founded counterarguments. To protect Japanese society from “stray generative AIs” that are likely to increase in the future, a world of “democratic AIs” is being considered, where multiple “justice-oriented AIs” engage in discussion with each other, and subsequently, their output is reviewed by humans to make the final decision.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!