Scalable, physics-aware 6D pose estimation for robot manipulation

Chaitanya Mitash
Robot Manipulation often depend on some form of pose estimation to represent the state of the world and allow decision making both at the task-level and for motion or grasp planning. Recent progress in deep learning gives hope for a pose estimation solution that could generalize over textured and texture-less objects, objects with or without distinctive shape properties, and under different lighting conditions and clutter scenarios. Nevertheless, it gives rise to a new set of challenges such as
more » ... the painful task of acquiring large-scale labeled training datasets and of dealing with their stochastic output over unforeseen scenarios that are not captured by the training. This restricts the scalability of such pose estimation solutions in robot manipulation tasks that often deal with a variety of objects and changing environments.The thesis first describes an automatic data generation and learning framework to address the scalability challenge. Learning is bootstrapped by generating labeled data via physics simulation and rendering. Then it self-improves over time by acquiring and labeling real-world images via a search-based pose estimation process. The thesis proposes algorithms to generate and validate object poses online based on the objects' geometry and based on the physical consistency of their scene-level interactions. These algorithms provide robustness even when there exists a domain gap between the synthetic training and the real test scenarios. Finally, the thesis proposes a manipulation planning framework that goes beyond model-based pose estimation. By utilizing a dynamic object representation, this integrated perception and manipulation framework can efficiently solve the task of picking unknown objects and placing them in a constrained space.The algorithms are evaluated over real-world robot manipulation experiments and over large-scale public datasets. The results indicate the usefulness of physical constraints in both the training and the online estimation phase. Moreover, the proposed framework, wh [...]
doi:10.7282/t3-s5fk-ht60 fatcat:lndswo56sfhdvkmsmuigsvr4hi