Abstract
A sensor fusion system described here extracts a target signal automatically from microphone signals corrupted by interference ambient noise. For this system, new sensor fusion algorithm for integrating audio signals and visual signals hierarchically is proposed.
In the first stage of the algorithm, audio signals and visual signals are fused each other in order to generate cue signal. Cue signal is an estimated signal of power variation of a target sound. In the second stage of the algorithm, audio signals and cue signal are fused each other to adjust the weights of an adaptive filter which extract the target signal.
A real-time processing system including 53 DSPs was constructed in order to evaluate this new algorithm. The experimental results using the real-time system demonstrates effectiveness of the cue signal which is generated by fusing audio signals and visual signals.
The advantage of proposed cue signal are discussed using numerical simulations for the first stage of this algorithm. The results of simulations demonstrates this fusion type cue signal has resistance to audio noise and visual noise, hence, it can be used in various environment.