Image captioning has been actively studied these days, however, most of the systems output captions of factual expression. In this paper, an automatic affective image captioning system using emotion estimation is proposed. The proposed system consists of four parts: a base caption generation part composed by the conventional CNN (VGG16), a scene estimation part, an emotion estimation part, and a figurative expression generation part. When a human exists in an image, the emotion is estimated from his/her facial expression and simile is used. When a human does not exist in an image, personification of metaphor is used. Evaluation experiments have been carried out using three kinds of evaluation indexes; BLUE, METEOR, and CIDEr. The experimental results indicate the effectiveness of the proposed system to generate affective captions.