2025 Volume 6 Issue 2 Pages 221-229
In the field of computer vision, Multiple Object Tracking (MOT) methods have been extensively studied. These studies typically evaluate tracker performance using benchmark datasets such as MOT17. In field studies, the use of MOT for automating traffic surveys is also becoming more common. While benchmark evaluations provide a reference point for selecting a tracker in field applications, the characteristics of videos captured for such surveys often differ from those in benchmark datasets. As a result, the performance reported in MOT studies may not always generalize well to field applications. To address this issue, we construct a dataset with ground truth annotations following the MOT Challenge format using video footage exclusively captured in sidewalk environments, specifically for automated traffic surveys. Using this dataset, we evaluate the performance of trackers in sidewalk environments. Furthermore, based on the evaluation results, we analyze the sources of errors and explore potential improvements to enhance tracking accuracy.