Abstract
We developed a system for detecting the speech intervals of multiple speakers and estimating the face orientation during the detected speech intervals by combining information of sound directions from multiple microphone arrays and human positions. The developed system was evaluated in three conditions: individual utterances in different positions and orientations, simultaneous dialogues by multiple speakers, and moving sources. Evaluation results revealed that the proposed system could detect speech intervals with more than 90% accuracy, and face orientations with mean absolute errors around 20 degrees, in situations excluding the cases where all arrays are in the opposite direction to the speaker's face orientation.