抄録
This paper proposes a computational model of dynamics between visual attention and segmentation, in which a sequence of image pyramids of early visual features is computed for a video sequence and a repetition of selective attention and figure-ground segmentation is performed on the sequence for object perception through successive segment development with mergence of concurrent segments. Attention is stochastically selected on a multi-level saliency map that is called a visual attention pyramid and segmentation is performed on Markov random fields which are dynamically formed around foci of attention. A set of segments and their spatial relation are stored in a visual working memory and maintained through the repetitive attention and segmentation process. Dynamics and performance of the model are evaluated for basic functions of the vision system such as visual pop-out, figure-ground reversal and perceptual organization and also for real-world scenes which contain objects designed to attract attention.