多視点楽曲検索に向けた楽曲分離表現学習

橋爪 優果; 宮下 敦志; 李 莉; 戸田 智基

doi:10.11517/pjsai.JSAI2024.0_1O4OS29a01

Abstract

To achieve a flexible MIR system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. Our previous study proposed the use of each instrumental sound signal to calculate music similarity with each instrument-dependent network, but using each sound signal as a query in search systems is impractical. In this paper, we propose a method to compute similarities focusing on each instrument with a single network that inputs mixed sounds. We design a single similarity embedding space with disentangled dimensions for each instrument, extracted by Conditional Similarity Networks, which is trained by the triplet loss using masks. Experimental results show that (1) each sub-embedding space can hold the characteristics of the corresponding instrument, and (2) the selection of musical pieces by the proposed method can obtain human consent in limited conditions.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!