An integrated convolutional neural network with a fusion attention mechanism for acoustic scene classification

Pengxu JIANG; Yue XIE; Cairong ZOU; Li ZHAO; Qingyun WANG

doi:10.1587/transfun.2022EAL2091

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

An integrated convolutional neural network with a fusion attention mechanism for acoustic scene classification

Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG

著者情報

キーワード: Acoustic scene classification, ICNN-FA, CNN, attention mechanism, Mel-spectrograms

ジャーナルフリー早期公開

論文ID: 2022EAL2091

DOI https://doi.org/10.1587/transfun.2022EAL2091

この記事には本公開記事があります。

The final version of this article is now available: Vol. E106.A (2023), No. 8 pp. 1057-1061

詳細

抄録

In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）