We are surrounded by various sounds. We can recognize what is going on through such environmental sounds and notice immediate danger or unusual events by crash sound or motor sound, for example. On the flip side, when we tell others the situation around us, we represent environmental sounds by onomatopoeia. Especially in Japan, onomatopoeia is very important to communicate with others, because there are a variety of onomatopoeias in Japanese. If computers can recognize environmental sounds and onomatopoeia, it would be more natural for human to interact with computers.
While environmental sound retrieval using onomatopoeia was pursued in previous research, there has been little work done which tries to transform environmental sounds to onomatopoeias. This paper proposes a method not only to do such transformation but also to reflect its sound effect such as loudness or sonority by using various fonts and changing their sizes.
Because the generation mechanism of environmental sounds is different from human voice, conventional speech recognition techniques can not be adopted. Thus, a support vector machine has been built in which spectrum contour is used as main features. Our system exploits the reverbs of environmental sounds in addition to SVM. Our method is capable of transcribing more than 50 kinds of onomatopoeia. Sound effects of loudness and the pitch level of sounds are expressed with various combination of normal fonts,italics and bold faces.
We evaluated how our system converts environmental sounds into onomatopoeias effectively and appropriately. 89% precision was attained, which proves effectivenss of our system.