EgoHumanoid Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration

Title: EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration
Authors: Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, Li Chen
arXiv: https://arxiv.org/abs/2602.10106

EgoHumanoid explores whether large-scale human egocentric demonstrations can bootstrap humanoid loco-manipulation beyond what expensive teleoperation data can provide. The challenge is severe embodiment mismatch: body morphology, camera viewpoint, and action feasibility all differ between human collectors and humanoid platforms.

The framework combines abundant robot-free egocentric demos with limited robot data in co-training, backed by an alignment pipeline spanning hardware setup, data protocol, view alignment, and action alignment. View alignment addresses perspective/height shift; action alignment maps human motion into a feasible humanoid control manifold.

a_{t}^{humanoid} = A (τ_{t}^{human}, κ)

Where $A$ denotes embodiment mapping constrained by humanoid kinematics $κ$ . The contribution is not only the mapping function but the end-to-end data-collection-and-alignment recipe that makes it practical at scale.

Real-world results report large gains over robot-only training, especially in unseen environments. For your broader agenda, this reinforces a promising direction: prioritize scalable human-data pipelines plus principled embodiment alignment, instead of assuming robot-native data must dominate.

Graph: Paper Node 2602.10106