Towards Generalist Robots through Visual World Modeling
Towards Generalist Robots through Visual World Modeling Boyuan Chen Moving from narrow robots specializing in specific tasks to generalist robots excelling in multiple tasks in various environmental conditions is the future of next-generation robotics. The key to generalist robots is the ability to learn world models that are reusable, generalizable, and adaptable. Having a general understanding of how the physical world works will enable robots to acquire transferable knowledge across
... tasks, predict possible outcomes of future actions before execution, and constantly update their knowledge through continual interactions. While the majority of robot learning frameworks tend to mix task-related and task-agnostic components altogether throughout the learning process, these two components are often not intertwined when one of them is changed. For example, a task-agnostic component such as the computational model of the robot body remains the same even under different task settings, while a task-related component such as the dynamics of a moving object remains the same for different embodiments. This thesis studies the key steps towards building generalist robots by decomposing the world modeling problem into task-agnostic and task-related elements: (1) robot self-modeling; (2) robot modeling other agents; and (3) robot modeling the physical environment. This framework has produced powerful and efficient learning-based robotic systems for a variety of tasks and physical embodiments, such as computational models of physical robots that can be reused and adapted to numerous task objectives and changing environments, behavior modeling frameworks for complex multi-robot applications, and dynamical system understanding algorithms to distill compact physics knowledge from high-dimensional and multi-modal sensory data. The approach in this thesis could help catalyze the understanding, prediction, and control of increasingly complex systems.