Artificial Intelligence and Data Science
Online ISSN : 2435-9262
Vision & Language and AI robot-driven science
Yoshitaka USHIKU
Author information
JOURNAL OPEN ACCESS

2025 Volume 6 Issue 1 Pages 26-40

Details
Abstract

The category of Vision and Language includes multimodal understanding, which outputs recognition results from both visual and textual inputs, Image2Text, which generates text from visual input, and Text2Image, which generates visuals from text. Currently, research in this field is accelerating. One example from the authors’ research is the development of an AI robot that collaborates with humans to create and transcend knowledge. This requires building a scientific foundational model that learns from scientific literature, conducts experiments autonomously, and becomes smarter through discussions with researchers. Other research examples include studies on automating experimental procedures into manuals, AI-driven discovery of scientific laws and principles from data, and research on discovering new materials. In the discovery of new materials, two approaches are being explored, one of which involves creating generative AI that uses highly accurate decoders for generating crystal structures.

Content from these authors
© 2025 Japan Society of Civil Engineers
Previous article Next article
feedback
Top