Health management applications have become popular and health awareness has increased in recent years. With this trend, the identification of food parts in food images becomes an important factor when calculating the calorie. CNNs (convolutional neural network) have greatly improved the performance of semantic segmentation tasks. However, there is a problem that pixel-level annotation which requires to create segmentation training data costs a lot. In addition, the existence of a countless number of food categories has led to a problem of insufficient data.
To address this problem, we propose Unseen Food Segmentation (USFoodSeg) which consists of pre-trained models trained on a large amount of food data. This model can segment any kinds of food masks with only category texts. The experiments showed it achieved 90% accuracy for the unseen food classes. In addition, we focus on the pre-trained knowledge of Stable Diffusion. We proposed StableSeg, which enables zero-shot segmentation for any class without using additional data, and the experiments showed that it reduced training cost and especially it was robust to food categories.
View full abstract