抄録
Combining CNNs for object detection and semantic segmentation is effective for a scale-aware bird detection. However, a
CNN for semantic segmentation requires many images with pixel-level annotation during training. We propose to train the CNN from images
with bounding boxes and compare our method to the strongly-supervised counterpart.