Computer Science Graduate Seminar

Tuesday, December 5, 2023, 1:00pm

Robust and Efficient Methods in Visual 3D Human Pose Estimation

  • Istvan Sarandi, M.Sc.  -  Chair of Computer Science 13
  • Place: Large b-it room (5053.2), Building E2, Informatics Centre, Ahornstr. 55
  • Hybrid: Zoom Link ​(Meeting-ID: 636 7289 5840 Pasword: 964382)



Computer vision algorithms for perceiving humans in the real world are crucial for several impactful emerging technologies, including self-driving cars and mobile service robots.

In this talk, I will present three contributions to improving the state of the art in deep learning-based 3D human pose estimation, that is, localizing major anatomical landmarks of the human body in 3D space from RGB images only. The central themes are robustness and efficiency, which constitute the main challenges in robotics applications.

We start by addressing robustness to occlusions, i.e., when objects block the line of sight between the person and the camera. After presenting the first systematic study of how occlusions deteriorate 3D pose estimation accuracy, we propose to mitigate the problem using an effective synthetic occlusion data augmentation strategy.

We then turn to the problem of truncation, i.e., when only a part of the body is within the camera's field of view. We develop a truncation-robust heatmap representation, which also allows learned recovery of the metric scale. Building upon this capability, I present an end-to-end learned absolute pose estimation method called MeTRAbs, for robustly reconstructing human poses in the camera's reference frame at state-of-the-art accuracy.

In the third part, I present the largest-scale experiment reported in the 3D pose literature to date, by merging a total of 32 datasets, in the pursuit of improved in-the-wild generalization, outside the controlled environments of motion capture studios. We overcome the challenge of differently annotated datasets through a novel affine-combining autoencoder formulation for capturing the common information in different landmark annotation formats. Importantly, these methods can run on low-powered robot hardware in real time.

I conclude with a discussion of possible extensions to the presented works, as well as exciting future challenges for the field as a whole.

The computer science lecturers invite interested people to join.