The goal of learning disentangled representation is to obtain interpretable feature representations from data such as images. Although many disentanglement methods based on deep generative models such as variational autoencoders and adversarial generative models have been proposed, the experiments in this paper experimentally show the vulnerability of learning entangled representations to noise in images. In this paper, we propose a variational autoencoder-based disentanglement method that is robust against noise and self-supervised by latent representations for reconstructed images and verify its effectiveness through numerical experiments.