Scene understanding is a central problem in a field of computer vision. Depth estimation, in particular, is one of the important applications in scene understanding, robotics, and 3-D reconstruction. Estimating a dense depth map from a single image is receiving increased attention because a monocular camera is popular, small and suitable for a wide range of environments. In addition, both multi-task learning and multi-stream, which use unlabeled information, improve the monocular depth estimation efficiently. However, there are only a few networks optimized for both of them. Therefore, in this paper, we propose a monocular depth estimation task with a multi-task and multistream network architecture. Furthermore, the integrated network which we develop makes use of depth gradient information and can be applied to both supervised and unsupervised learning. In our experiments, we confirmed that our supervised learning architecture improves the accuracy of depth estimation by 0.13 m on average. Additionally, the experimental result on unsupervised learning found that it improved structure-from-motion performance.