Skills transfer across dissimilar robots by learning context-dependent rewards

Milad S. Malekzadeh, Danilo Bruno, Sylvain Calinon, Thrishantha Nanayakkara, Darwin G. Caldwell
2013 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems  
Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level extraction of the underlying intent. By focusing on this last form, we study the problem of extracting the reward function explaining the demonstrations from a set of candidate reward functions, and using this information for self-refinement of the skill. This definition of the problem has links with inverse reinforcement learning problems
more » ... which the robot autonomously extracts an optimal reward function that defines the goal of the task. By relying on Gaussian mixture models, the proposed approach learns how the different candidate reward functions are combined, and in which contexts or phases of the task they are relevant for explaining the user's demonstrations. The extracted reward profile is then exploited to improve the skill with a self-refinement approach based on expectation-maximization, allowing the imitator to reach a skill level that goes beyond the demonstrations. The approach can be used to reproduce a skill in different ways or to transfer tasks across robots of different structures. The proposed approach is tested in simulation with a new type of continuum robot (STIFF-FLOP), using kinesthetic demonstrations from a Barrett WAM manipulator.
doi:10.1109/iros.2013.6696585 dblp:conf/iros/MalekzadehBCNC13 fatcat:ndqmlh3btfajjlijwiqzoxkz7i