Principal component and spectral-based feature sets were applied to the recognition of gamelan instrument sounds using support vector machines (SVMs). The principal components were calculated on the basis of a segmented scalogram from the first harmonic frequency of the gamelan recordings. The segmented scalogram is assumed as a ``facial image'' of the gamelan instrument sound in a frontal pose, neutral expression, and normal lighting. The scalogram was computed from the gamelan sound signal using a continuous wavelet transform (CWT). The performance and contribution of the principal component and spectral-based features were compared using an F-measure. For the training phase, the feature sets were extracted from isolated tones that were recorded over the entire frequency range of four gamelan instruments (
demung,
saron,
peking, and
bonang families). Using 90%/10% splits between the training and validating data sets, model classifiers were constructed from the radial basis function (RBF) kernel SVM. The classifiers are composed of 28 separate One-Against-One multiclass classifiers. The experiment showed that the spectral-based feature set shows an average F-measure of 74.05% and the appearance-based feature yields 71.87%. For
saron-only note tracking, the spectral-based feature set had an F-measure of 83.79%, higher than the
demung-only note tracking, which yielded 63.89%.
View full abstract