Abstract
The extraction of useful deep features is important for many computer vision tasks. Deep features extracted from classification networks have proved to perform well in those tasks. On the other hand, end-to-end distance metric learning (DML) has been applied to train the feature extractor directly. However, many researches on DML did not make equitable comparisons to features extracted from classification networks, thus it is still unclear which training strategy is superior for learning feature representations. In this paper, by presenting objective comparisons between these two approaches under the same network architecture, we show that the softmax-based features are markedly better than DML features, especially when the dataset for training is large.