Learning 2D Sound Source Localization for Microphone Array Layout Invariance using Explicit Transformation Layer

Phongtharin VINAYAVEKHIN; Guillaume Le MOING; Jayakorn VONGKULBHISAL; Don Joven AGRAVANTE; Tadanobu INOUE; Asim MUNAWAR; Ryuki TACHIBANA

doi:10.11517/pjsai.JSAI2020.0_2G1ES403

34th (2020)

セッションID: 2G1-ES-4-03

DOI https://doi.org/10.11517/pjsai.JSAI2020.0_2G1ES403

会議情報

主催: The Japanese Society for Artificial Intelligence

会議名: 第34回全国大会(2020)

回次: 34

開催地: Online

開催日: 2020/06/09 - 2020/06/12

Learning 2D Sound Source Localization for Microphone Array Layout Invariance using Explicit Transformation Layer

*Phongtharin VINAYAVEKHIN, Guillaume Le MOING, Jayakorn VONGKULBHISAL, Don Joven AGRAVANTE, Tadanobu INOUE, Asim MUNAWAR, Ryuki TACHIBANA

著者情報

キーワード: 2D Sound Localization, Microphone Arrays, Deep Learning

会議録・要旨集フリー

詳細

抄録

We tackle the task of localizing the 2D Cartesian coordinates of sound source(s) in an enclosed environment by using multiple microphone arrays. Recently, deep learning has led to promising results for this task due to its robustness to noise and reverberations in the environment. However, a large amount of labeled data is required and the resulting model only works well for the microphone array layout in the training data. Recording and labeling data in all of the desired layouts becomes very costly and tedious. This paper proposes a solution to this problem by using an explicit transformation layer embedded in the neural network. Our results in simulated acoustic environments show that the method allows the model to be trained with the data from specific microphone array layouts while generalizing well to data in various unseen layouts during inference.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）