Deformable Mesh Transformer for 3D Human Mesh Recovery

Yusuke YOSHIYASU; Louise ALLAIN

doi:10.11370/isj.62.622

抄録

In this review paper, we report our model for recovering a 3D human mesh from a single 2D monocular image, called Deformable mesh transFormer (DeFormer) ¹⁾ which was published at the CVPR 2023 conference. While the current state-of-the-art models enable good performances by taking advantage of the transformer architecture to model long-range dependencies on input tokens, they suffer from a high computational cost due to the use of the standard transformer attention mechanism whose complexity is quadratic in the input sequence length. Therefore, we developed DeFormer, a human mesh recovery method that is equipped with two computationally efficient attention modules : 1) body-sparse self-attention and 2) Deformable Mesh cross-Attention (DMA). Experimental results show that DeFormer is able to efficiently leverage multi-scale feature maps and a dense mesh, which was not possible by previous transformer approaches. As a result, DeFormer achieves state-of-the-art performances on Human3.6M and 3DPW benchmarks. Code is available at https://github.com/yusukey03012/deformer.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

電子写真

電子写真学会誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）