局所的に傾いた楽譜スマホ写真からMusicXMLを作成する小節ベースの複数深層学習モデルを使用した音符組み立て方法 楽譜スマホ写真から音楽を再生するアプリimg2Mxml

宍戸 知行; ファティ フェヒミユ; 徳重 大輔; 小野 靖弘; 熊澤 逸夫

doi:10.11517/pjsai.JSAI2024.0_3E1GS1005

Abstract

Deep learning has been applied to optical music sheet recognition (OMR). However, OMR processing from various sheet-music images still lacks precision to be widely applicable. We propose a measure-based multimodal deep-learning-driven assembly (MMdA) method enabling end-to-end OMR processing from various images including inclined photo images. Using this method, measures are extracted using a deep-learning model, aligned, and resized to be used for inference of given musical-symbol components by using multiple deep-learning models in sequence or in parallel. The use of each standardized measure enables efficient training of the deep-learning models and accurate adjustment of five staff lines in each measure, which enables locally inclined sheet-music images to be precisely positioned. Thus, a score can be reproduced from the inclined image with the proposed MMdA method while current OMR applications cannot. Multiple musical-symbol-component deep-learning feature-category models with a small number of feature types can represent a diverse set of notes and other musical symbols including chords. The proposed MMdA method provides a solution to end-to-end OMR processing and enhances the utility of OMR of mobile phone- based sheet-music photo images.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!