日本感性工学会論文誌
Online ISSN : 1884-5258
ISSN-L : 1884-0833
人間の画像認識とコンピュータビジョンの画像認識はどこが違うのか
大森 宏羽生 和紀
著者情報
ジャーナル フリー 早期公開

論文ID: TJSKE-D-23-00036

詳細
抄録

There exist some Computer Vision Models (CVMs) such as CNN, Vision Transformer (ViT), and CLIP, which were pre-trained on a huge amount of training data. The image cognition power of these CVMs is very high. In our environmental cognition research using photos, we manually measured the inter-photo visual similarity. Our previous study found that CVM-based photo similarity and visual similarity were quite similar, when compared by photo MDS. However, it was also suggested that the difference in image cognition between humans and CVM was related to representation of humans. We investigated here numerically in detail the difference between CVM-based photo similarity and visual similarity, using six types of photo sets. The influence of representation could be evaluated by cluster size on MDS. It was shown that representation influences the cognition of shrines and temples, foods, insects, buildings, greens, garden styles, perspective views, night views, the symbol tree, and so on.

著者関連情報
© 2023 日本感性工学会
前の記事 次の記事
feedback
Top