Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
To achieve a flexible MIR system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. Our previous study proposed the use of each instrumental sound signal to calculate music similarity with each instrument-dependent network, but using each sound signal as a query in search systems is impractical. In this paper, we propose a method to compute similarities focusing on each instrument with a single network that inputs mixed sounds. We design a single similarity embedding space with disentangled dimensions for each instrument, extracted by Conditional Similarity Networks, which is trained by the triplet loss using masks. Experimental results show that (1) each sub-embedding space can hold the characteristics of the corresponding instrument, and (2) the selection of musical pieces by the proposed method can obtain human consent in limited conditions.