Deterministic Policy Gradient Based Robotic Path Planning with Continuous Action Spaces

Somdyuti Paul, Lovekesh Vig
2017 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)  
One of the most important tasks for Autonomous Robotics is the ability to manipulate objects in real world unstructured environments. Traditional path planning for robotic manipulators requires precise location of the target object in the environment based on which inverse kinematics return the required joint-angles for approaching the object. This limits their use in real domains with dynamic relative positions of objects not being readily available. Recent work on deep reinforcement learning
more » ... or manipulation appear to be more succesful in adapting to different target object positions. In this paper, we present a deterministic policy based actor-critic learning framework to encode the path planning strategy irrespective of the robot pose and target object position. This reinforcement learning (RL) agent solely uses two different views of the environment to learn about path planning in order to reach a given target from a random pose, instead of relying on a depth sensor. The state-space for the RL agent is thus defined as the stereo-view of the environment whereas the action values are torques applied to the robot's joints. The reward function is defined on the relative distance between the endeffector and target object in pixels. In the episodic learning framework, the actor-critic network learns the optimal actions in the continuous space of real numbers for a given state configuration by trying to increase the expected reward. We demonstrate the validation of this approach in a simulated environment yielding 100% success rate from 100 different robot poses, with relatively few steps required on an average to reach the target. We further show that our learning strategy bests deep Q-learning based methods which have been used for similar path planning purpose. This path planning approach does not require conventional feature matching and triangulation for object localization which is error prone and inaccurate, and solves inverse kinematics and depth estimation using only the scene information.
doi:10.1109/iccvw.2017.91 dblp:conf/iccvw/PaulV17 fatcat:rmgorj3ka5annmf4o3gsrfo434