E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments release_hzrumgw5ynegndhdg5ckmlzbbi

by Vedang Naik, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India., Rohit Sahoo, Sameer Mahajan, Saurabh Singh, Shaveta Malik, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India., College of Engineering, Penn State University, Paoli, PA, USA., Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India., Associate Professor, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.

Published in International Journal of Engineering and Advanced Technology by Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication - BEIESP.

2021   Volume 11, p29-34

Abstract

Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.
In application/xml+jats format

Archived Files and Locations

application/pdf   564.8 kB
file_btjlwlcaqrdzthstweoduzyxxi
www.ijeat.org (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2021-12-30
Journal Metadata
Open Access Publication
Not in DOAJ
In ISSN ROAD
Not in Keepers Registry
ISSN-L:  2249-8958
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 7f283b57-8a11-4669-b49e-ac1d5a581443
API URL: JSON