In this paper, we propose a method for estimating the size of an object from a single image taken with a monocular camera. We realize it by using a convolution neural network (CNN) for depth estimation and information of a plane and a reference object standing upright on the plane.