2023 Volume 38 Issue 3 Pages A-MA3_1-12
Deep learning has been applied to optical music sheet recognition (OMR). However, OMR processing from various sheet-music images still lacks precision to be widely applicable. We propose a measure-based multimodal deep-learning-driven assembly (MMdA) method enabling end-to-end OMR processing from various images including inclined photo images. Using this method, measures are extracted using a deep-learning model, aligned, and resized to be used for inference of given musical-symbol components by using multiple deep-learning models in sequence or in parallel. The use of each standardized measure enables efficient training of the deep-learning models and accurate adjustment of five staff lines in each measure, which enables locally inclined sheet-music images to be precisely positioned. Thus, a score can be reproduced from the inclined image with the proposed MMdA method while current OMR applications cannot. Multiple musical-symbol-component deep-learning feature-category models with a small number of feature types can represent a diverse set of notes and other musical symbols including chords. The proposed MMdA method provides a solution to end-to-end OMR processing and enhances the utility of OMR of mobile phonebased sheet-music photo images.