A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="http://i-us.ru/index.php/ius/article/download/13564/14100">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="State University of Aerospace Instrumentation (SUAI)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zhxd67kb4jgfvm3yshkqwnqahq" style="color: black;">Information and Control Systems</a>
Due to its advantages, such as high flexibility and the ability to move heavy pieces with high torques and forces, the robotic arm, also named manipulator robot, is the most used industrial robot. Purpose: We improve the controlling quality of a manipulator robot with seven degrees of freedom in the V-REP program's environment using the reinforcement learning method based on deep neural networks. Methods: Estimate the action signal's policy by building a numerical algorithm using deep neural<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.31799/1684-8853-2020-5-24-32">doi:10.31799/1684-8853-2020-5-24-32</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bzeiicvkjrd2bcp3iwzuyoyxm4">fatcat:bzeiicvkjrd2bcp3iwzuyoyxm4</a> </span>
more »... works. The action-network sends the action's signal to the robotic manipulator, and the critic-network performs a numerical function approximation to calculate the value function (Q-value). Results: We create a model of the robot and the environment using the reinforcement-learning library in MATLAB and connecting the output signals (the action's signal) to a simulated robot in V-REP program. Train the robot to reach an object in its workspace after interacting with the environment and calculating the reward of such interaction. The model of the observations was done using three vision sensors. Based on the proposed deep learning method, a model of an agent representing the robotic manipulator was built using four layers neural network for the actor with four layers neural network for the critic. The agent's model representing the robotic manipulator was trained for several hours until the robot started to reach the object in its workspace in an acceptable way. The main advantage over supervised learning control is allowing our robot to perform actions and train at the same moment, giving the robot the ability to reach an object in its workspace in a continuous space action. Practical relevance: The results obtained are used to control the behavior of the movement of the manipulator without the need to construct kinematic models, which reduce the mathematical complexity of the calculation and provide a universal solution.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201026101953/http://i-us.ru/index.php/ius/article/download/13564/14100" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5e/f6/5ef65194352a8586837e8d40c3b99dfef4219546.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.31799/1684-8853-2020-5-24-32"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>