A Cross-Entropy Loss is often used as loss function in training a semantic segmentation network for facial parts. With the Cross-Entropy Loss, the network learns so that the segmentation accuracy of the facial parts having a large number of pixels is high. Therefore, the trained network would not be able to segment face parts with a small number of pixels accurately. In addition, it may be possible to flexibly deal with facial parts of various sizes by comprehensively analyzing features at different resolutions. The purpose of this study was to develop a semantic segmentation method for facial parts using a modified U-Net, which hasMultiple Decoders structure for analyzing features at different resolutions, with Generalized Dice Loss for correcting the bias of the number of pixels in each class. Our database consisted of 30,000 face images from CelebA Mask HQ dataset. The proposed network based on U-Net consists of an encoder, five decoders that perform semantic segmentation independently using feature maps extracted from the encoder at different resolutions, and a recognition that integrates the analysis information in those decoders. The mean intersection over union for the proposed method was 0.846, which was greater than those for SegNet (0.711), U-Net (0.803), and a modified SegNet with Encoder-Multiple Decoders (0.805).
View full abstract