言語情報と画像情報を用いたPOIの業種予測のためのマルチモーダル深層モデル

澤田 一正; 沖本 祐典; 金森 研太; 野田 五十樹; 小山 聡; 宰川 潤二

doi:10.11517/pjsai.JSAI2023.0_2E4GS603

37th (2023)

Session ID : 2E4-GS-6-03

DOI https://doi.org/10.11517/pjsai.JSAI2023.0_2E4GS603

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 37

Location : [in Japanese]

Date : June 06, 2023 - June 09, 2023

Multimodal Deep Model for POI Category Prediction using Linguistic and Image Information

*Issei SAWADA, Yusuke OKIMOTO, Kenta KANAMORI, Itsuki NODA, Satoshi OYAMA, Junji SAIKAWA

Author information

Keywords: multimodal deep learning, user reviews

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

The accuracy of POI (Point of Interest) categories is becoming increasingly important since numerous users use services that rely on POI categories nowadays. Machine learning models are widely used to infer POI categories from various information. Recently, it has been reported that multimodal deep models show high performance in many tasks. In this paper, we propose a multimodal deep model for POI category prediction using both linguistic and image information. In order to use image information effectively, the proposed model (1) introduces a loss against prediction based only on linguistic information and (2) introduces pooling to input multiple images for each POI. Using Yahoo! Japan's POI database, we confirmed that the proposed method improves the performance of POI category prediction compared to the baseline that uses only linguistic or image information.

Corresponding author

Conference information

Register with J-STAGE for free!