Nonlinear Theory and Its Applications, IEICE
Online ISSN : 2185-4106
ISSN-L : 2185-4106
Regular Section
Applying transfer learning and data augmentation to image recognition models for music emotion classification
Yoshihiro MatsubaraYuya MatsudaJousuke Kuroiwa
Author information
JOURNAL OPEN ACCESS

2025 Volume 16 Issue 4 Pages 1009-1021

Details
Abstract

In this study, we propose a method to construct an image recognition model with a single-channel input using transfer learning and data augmentation for music emotion classification. The data augmentation method generates a variety of spectrogram images by varying the STFT window size in small increments. This method ensures data equivalent to five times the amount of the original data and prevents degradation of classification performance due to insufficient data. The model construction method using transfer learning for grayscale images is designed to adapt the pre-trained EfficientNetV2 model, which was originally trained on ImageNet. The constructed model through transfer learning and our proposed data augmentation method achieved a classification accuracy of 94.8% on the 4Q Audio Emotion Dataset. Thus, our construction method using transfer learning for grayscale images, combined with the proposed data augmentation method, is effective in achieving music high-accuracy emotion classification.

Content from these authors
© 2025 The Institute of Electronics, Information and Communication Engineers

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
Previous article Next article
feedback
Top