2018 年 22 巻 5 号 p. 611-620
Cross-media retrieval has raised a lot of research interests, and a significant number of works focus on mapping the heterogeneous data into a common subspace using a couple of projection matrices corresponding to each modal data before implementing similarity comparison. Differently, we reconstruct one modal data (e.g., images) to the other one (e.g., texts) using a model named sparse neural network pre-trained by Restricted Boltzmann Machines (MRCR-RSNN) so that we can project one modal data into the space of the other one directly. In the model, input is low-level features of one modal data and output is the other one. And cross-media retrieval is implemented based on the similarities of their representatives. Our model need not any manual annotation and its application is more widely. It is simple but effective. We evaluate the performance of our method on several benchmark datasets, and experimental results prove its effectiveness based on the Mean Average Precision (MAP) and Precision Recall (PR).
この記事は最新の被引用情報を取得できません。