PA-HOI Dataset:
A Physics-Aware Human and Object Interaction Dataset

Shanghai Jiao Tong University,
Vision Star Information Technology Co., Ltd.

Abstract

The Human-Object Interaction (HOI) task explores the dynamic interactions between humans and objects in physical environments, providing essential biomechanical and cognitive-behavioral foundations for fields such as robotics, virtual reality, and human-computer interaction. However, existing HOI data sets focus on details of hand grasping or object grasping preferences, often neglecting the influence of physical properties of objects on human long-term motion. To bridge this gap, we introduce the PA-HOI Motion Capture dataset, which highlights the impact of objects’ physical attributes on human motion dynamics, including human posture, moving velocity, and other motion characteristics. The dataset comprises 562 motion sequences of human-object interactions, with each sequence performed by subjects of different genders interacting with 35 3D objects that vary in size, shape, and weight. This dataset stands out by significantly extending the scope of existing ones for understanding how the physical attributes of different objects influence human posture, speed, motion scale, and interacting strategies. We further demonstrate the applicability of the PA-HOI dataset by integrating it with existing motion generation methods, validating its capacity to transfer realistic physical awareness.

Overview

Dataset pipeline
The overview of our dataset collection and post-processing. We adopt a unified textual template integrated with a predefined set of annotation keywords to describe HOI actions, and employ LLMs to augment the prompt for each sequence. During the motion data acquisition phase, subjects first calibrate while wearing motion capture suits, and objects require inertial sensor bias correction before performing corresponding interactive actions described in the prompts. The motion data is then processed to fit SMPL-X sequences.

Dataset Analysis

Dataset Analysis
The dataset analysis. (a) and (b) illustrate the proportional distributions of various motions and motion paths within the interaction sequences, respectively. (c) presents the changes in the number of textual descriptions before and after augmentation with different action prompts. (d) reports the average number of frames in interaction sequences involving different actions and objects of varying weights.

Behind The Scenes

Rendering Results

Prompt: "A person begins going forward while holding a yoga ball, which is a large and medium-weight sphere, with both hands wrapped around its surface, then places it on the table."

Application

MDM

MDM*

StableMoFusion

StableMoFusion*

"A person picks up a barbell, which is a large and heavy cylinder, from the ground by grabbing it through two hands, palms hugging its barbell bar, then goes forward, and finally places it on the table."

"A person pushes a large box, which is a large and heavy cuboid, straight by making contact with its side."

"A person picks up a small bottle, which is a small and medium weight cylinder, from the ground by grabbing it using the left hand, palm wrapped around its side, then holds position, and finally places it on the table."

BibTeX

@misc{wang2025pahoiphysicsawarehumanobject,
      title={PA-HOI: A Physics-Aware Human and Object Interaction Dataset}, 
      author={Ruiyan Wang and Lin Zuo and Zonghao Lin and Qiang Wang and Zhengxue Cheng and Rong Xie and Jun Ling and Li Song},
      year={2025},
      eprint={2508.06205},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.06205}, 
}