This article investigates voices that stand out in noisy environments. We call them pop-out voices. Our previous study using non-native Japanese speakers suggested that attributes contributing to pop-out voices differ from those for intelligibility. In this study, we introduced auditory model-based representations, namely gammatone filterbank outputs. We derived several types of masks based on the glimpse model of speech perception under interfering sound environments. We found that graded masks coincide better with the subjective test results than binary masks. We present the details of the models and simulation results. This article is an extended version of our presentation at the autumn meeting of the Acoustical Society of Japan in 2024, reflecting the results of further investigations and detailed descriptions of the method.
In previous research, pop-out voices are defined as voices that are conspicuously perceived in the midst of disturbing sounds such as background noise. Factors that are expected to be associated with pop-out voices include "linguistic characteristics," "spectral shape," "degree of reverberation," and "type and nature of background noise." Therefore, in this study, we focus on the type and magnitude of background noise and examine how the acoustic features affect the pop-out evaluation depending on the background noise for audio recorded daily conversations.