Article ID: 2024EAP1128
Automatic Micro-Expression (ME) spotting is a fundamental yet essential task in Micro-Expression analysis from videos. Due to the brief and subtle of ME, the ME spotting task is challenging and the spotting performance need to be further improved. However, current works generally neglect the correlation between expression proposals in a video. In this work, we propose a two-stage relation-aware graph convolutional network (MES-RANet) to locate the temporal positions of macro- and micro-expression. First, we adopt a temporal evaluation module (TEM) to predict the frame-level probabilities from the spatial-temporal feature sequences and generate candidate proposals for the subsequent module. Specifically, in relation-aware module (RAM), we formulate video proposals as graph nodes, and proposal-proposal correlations as edges to construct graphs. Then apply the relation-aware network to model the relations among proposals and learn powerful representations for the boundary regression. Comprehensive experiment results show that MES-RANet is effective and achieves competitive performance compared with state-of-the-art methods on two public benchmark datasets CAS(ME)2 and SAMM-LV. Codes are available at https://github.com/hahaluluyo/MES-RANet.