Perspectives on microphone array processing including sparse recovery, ray space analysis, and neural networks

Craig T. Jin; Shiduo Yu; Fabio Antonacci; Augusto Sarti

doi:10.1250/ast.41.308

Abstract

Hands-free audio services supporting speech communication are playing an increasingly ubiquitous and foundational role in everyday life as services for the home and work become more automated, interactive and robotic. People will speak their instructions (e.g. Siri) to control and interact with their environment. This makes it an exciting time for acoustics engineering because the demands on microphone array performance are rapidly increasing. The microphone arrays are expected to work at increasing distances in noisy and reverberant situations; they are expected to record not just the sound content, but also the sound field; they are expected to work in multi-talker situations and even on moving, robotic platforms. Audio technology is currently undergoing rapid change in which it is becoming feasible, from both a cost and hardware point-of-view, to incorporate multiple and distributed microphone arrays with hundreds or even thousands of microphones within a built environment. In this review paper, we consider microphone array signal processing from two relatively recent vantage points: sparse recovery and ray space analysis. To a lesser extent, we also consider neural networks. We present the principles underlying each method. We consider the advantages and disadvantages of the approaches and also present possible methods to integrate these techniques.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!