Observation of marine soundscapes, which comprises biological, geological, and anthropological sounds, is a cost-effective tool to monitor the biodiversity and human activity in the coastal seas. Real-time and automated classification of complex underwater sound sources is challenging but essential step to elicit meaningful information from recorded sound stream. Although numerous classification algorithms have been proposed, many of them were designed for a specific sound source, since time-frequency characteristics largely vary among soundscape component. In a bid to establish a real-time monitoring system, this study aimed at developing a single classification algorithm which is capable for diverse marine soundscape components. Ground-truth sound data were collected in the coral reef area around Ishigaki Island, Japan, and in captivity. After the filtering and normalization procedure for collected sounds, two-second log-spectrograms were generated. An image classifier with a convolutional neural network was then trained using these spectrograms. The classifier was trained with 6511 samples from 52 sound classes. Average F-measures from Level 1 (biophony, geophony, and anthrophony), 2 (fish, marine mammals, or vessels), 3 (species) classifications were 90.9%, 94.3%, 93.0%, respectively. Given the quality and quantity of training data play crucial role for the development of reliable classifier using AI technique, building a database to accumulate underwater sound is expected.
抄録全体を表示