2025 年 91 巻 3 号 p. 411-417
This study employs Large Vision-Language Model that integrates images and text to learn individuals' preferences and the rationales behind their judgments from a small dataset. Existing methods for estimating preferences and impressions have handled these by quantifying them; however, quantified preferences and impressions are difficult to interpret and lack explainability. Therefore, this research aims to enhance the explainability of preferences by generating the rationale behind preference judgments. We collected individual datasets by gathering personal preferences ("preferable" or "not preferable") and their rationales in specific domains and attempted to implement this by further training the Large Vision-Language Model. Experiments confirmed that, despite the dataset's limitation to only 200 images, it is feasible to estimate individual preferences and generate their rationales.