Auditory scene analysis (ASA) groups the sensory evidence derived from a mixture of sound sources and derives separate descriptions of the individual sources. There are two possible ways of doing this:(a) using knowledge of the properties of certain classes of sounds, such as speech, in a top-down fashion;(b) using, in a bottom-up manner, those properties in the input (such as harmonic relations) that are typically found when two or more sound sources (
e.g., voice and footsteps) affect the input at the same time. Some properties of the ASA done by humans are discussed and proposed as specifications for a computational model: the adding up of evidence, the coherence of the ASA system, the combining of top-down and bottom-up constraints, as illustrated in the duplex perception of speech, and the propagation of influences across the auditory field.
View full abstract