In this paper, we propose a method for simultaneous learning of multi-modules for joint attention: gaze-driven attention and word-driven attention. Inspired from child language acquisition, mutually exclusivity bias is utilized for mutual facilitative learning both in an intra- and inter-module manner by extending a modified Hebbian learning rule. Experiments on a human-robot interaction and on the computer simulations, we analyzed that the proposed method enabled mutually facilitative learning of a mapping for gaze-following and a label-to-object mapping by which the learner performs multimodal joint attention with its caregiver. Finally, through a computer simulation resembling mother-infant interaction, we argued a possibility of the proposed learning mechanism as a constructivist model for infant's cognitive development.