Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Monocular depth estimation techniques have seen significant improvements in accuracy, paralleling the evolution of deep learning. While the performance of these deep models is often evaluated based on their alignment with human perception, depth estimation models have seldom been subjected to such comparative evaluations. In this paper, we compare human and model judgements to monocular depth estimation regarding accuracy and error consistency. As a result, 27 of 34 models have higher accuracy (closer to the ground truth) than humans (0.708, 95%CI: [0.702, 0.713]). However, error consistencies were low for all models relative to their counterparts across humans (0.447, 95%CI: [0.427, 0.465]). The results suggest that strategies to improve error consistency with human judgements include using multiple datasets and avoiding direct training on the dataset that is i.i.d. with the test images.