Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Liu, Hanbing; Xiang, Wangmeng; He, Jun-Yan; Cheng, Zhi-Qi; Luo, Bin; Geng, Yifeng; Xie, Xuansong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.01365 (cs)

[Submitted on 4 Sep 2023 (v1), last revised 4 Feb 2024 (this version, v3)]

Title:Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Authors:Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie

View PDF

Abstract:Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture. With the success of transformers, we introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends intra-block temporal modeling via its Temporal Pyramidal Compression-and-Amplification (TPCA) structure and refines inter-block feature interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA block exploits a temporal pyramid paradigm, reinforcing key and value representation capabilities and seamlessly extracting spatial semantics from motion sequences. We stitch these TPCA blocks with XLR that promotes rich semantic representation through continuous interaction of queries, keys, and values. This strategy embodies early-stage information with current flows, addressing typical deficits in detail and stability seen in other transformer-based methods. We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks with minimal computational overhead. The source code is available at this https URL.

Comments:	11 pages, 5 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2309.01365 [cs.CV]
	(or arXiv:2309.01365v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.01365

Submission history

From: Wangmeng Xiang [view email]
[v1] Mon, 4 Sep 2023 05:25:10 UTC (9,621 KB)
[v2] Wed, 6 Sep 2023 02:18:23 UTC (9,805 KB)
[v3] Sun, 4 Feb 2024 07:17:28 UTC (5,506 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators