Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
c-SNE: Deep Cross-modal Retrieval based on Subjective Information using Stochastic Neighbor Embedding
Yusuke YamadaTadashi OkoshiJin Nakazawa
著者情報
ジャーナル フリー

2023 年 31 巻 p. 246-255

詳細
抄録

Cross-modal information retrieval based on subjective information aims to enable flexible media retrieval services, such as allowing users to specify, for example, an image to search audio clips. The resulting audio clips should have an impression similar to the specified image. Existing methods focus on building cross-media cross-modal relationships using objective information (such as the standard caption). However, such a relation can be built only between the pieces of media that are originally related, which limits the flexibility of cross-modal media retrieval. This research leverages subjective information in the media clips for similarity calculation to achieve more flexibility. We propose a novel cross-modal stochastic neighbor embedding technique called c-SNE. c-SNE can extract features of subjective information from pieces of media and map them in the common embedding space. It is a learning technique to bridge the heterogeneous gap between the modal distributions using label-weighted SNE. It allows users to find the media that share the same subjective information with a query medium. Our experimental results on the benchmark datasets demonstrate that the proposed method effectively performs in cross-modal distribution alignment and retrieval. Furthermore, our user study with ten users with 600 data points confirmed that c-SNE outperforms three related methods in the actual usage situation from the users' perspective.

著者関連情報
© 2023 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top