E xploration E xploitation Problem in Policy Based Deep Reinforcement Learning for Episodic and Continuous Environments
release_hzrumgw5ynegndhdg5ckmlzbbi
by
Vedang Naik,
Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.,
Rohit Sahoo,
Sameer Mahajan,
Saurabh Singh,
Shaveta Malik,
Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.,
College of Engineering, Penn State University, Paoli, PA, USA.,
Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.,
Associate Professor, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.
2021 Volume 11, p29-34
Abstract
Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.
In application/xml+jats
format
Archived Files and Locations
application/pdf
564.8 kB
file_btjlwlcaqrdzthstweoduzyxxi
|
www.ijeat.org (publisher) web.archive.org (webarchive) |
article-journal
Stage
published
Date 2021-12-30
Open Access Publication
Not in DOAJ
In ISSN ROAD
Not in Keepers Registry
ISSN-L:
2249-8958
access all versions, variants, and formats of this works (eg, pre-prints)
Crossref Metadata (via API)
Worldcat
SHERPA/RoMEO (journal policies)
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar