Abstract
We propose a novel technique for estimating the number of people in a video sequence; it has the advantages of being stable even in crowded situations and needing no ground-truth data. By analyzing the geometrical relationships between image pixels and their intersection volumes in the real world quantitatively, a foreground image directly indicates the number of people. Because foreground detection is possible even in crowded situations, the proposed method can be applied in such situations. Moreover, it can estimate the number of people in an a priori manner, so it needs no ground-truth data unlike existing feature-based estimation techniques. Experiments show the validity of the proposed method.