236 Hits in 3.9 sec

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods [article]

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar
2017 arXiv   pre-print
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods.  ...  While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with  ...  The resulting algorithm, called SM-UCRL (Spectral Method for Upper-Confidence Reinforcement Learning), runs through epochs of variable length, where the agent follows a fixed policy until enough data are  ... 
arXiv:1705.02553v1 fatcat:xfqurbxubjaprouc267yyjhdki

Deep Q-network using reservoir computing with multi-layered readout [article]

Toshitaka Matsuki
2022 arXiv   pre-print
The experimental results show that using multi-layered readout improves the learning performance of four classical control tasks that require time-series processing.  ...  Recurrent neural network (RNN) based reinforcement learning (RL) is used for learning context-dependent tasks and has also attracted attention as a method with remarkable learning performance in recent  ...  Katsunari Shibata for useful discussions about this research. This work was supported by JSPS KAKENHI (Grant-in-Aid for Encouragement of Scientists) Number 21H04323.  ... 
arXiv:2203.01465v1 fatcat:hl5s5u2ihbczro7n3agbbu3aay

Learning Causal State Representations of Partially Observable Environments [article]

Amy Zhang, Zachary C. Lipton, Luis Pineda, Kamyar Azizzadenesheli, Anima Anandkumar, Laurent Itti, Joelle Pineau, Tommaso Furlanello
2021 arXiv   pre-print
We demonstrate that these learned state representations are useful for learning policies efficiently in reinforcement learning problems with rich observation spaces.  ...  (POMDP).  ...  Itti were supported by the National Science Foundation (grant number CCF-1317433), C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation  ... 
arXiv:1906.10437v2 fatcat:g7pz2uk4s5hjbi76tgaqxy3eki

Learning to Coordinate via Multiple Graph Neural Networks [article]

Zhiwei Xu, Bin Zhang, Yunpeng Bai, Dapeng Li, Guoliang Fan
2021 arXiv   pre-print
This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks and value-decomposition methods.  ...  MGAN learns the representation of agents from different perspectives through multiple graph networks, and realizes the proper allocation of attention between all agents.  ...  The change of the environment is no longer determined by a single agent but is the result of the joint actions of all agents in MAS, which results in the traditional single-agent reinforcement learning  ... 
arXiv:2104.03503v1 fatcat:zxxgm3je5vf33b36xelnawxgxu

Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

Finale Doshi-Velez, David Pfau, Frank Wood, Nicholas Roy
2015 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning  ...  We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations  ...  ACKNOWLEDGMENTS The authors thank David Hsu for insightful discussions on action-selection in Bayesian reinforcement learning.  ... 
doi:10.1109/tpami.2013.191 pmid:26353250 fatcat:yjmmzunfo5etfgd24n4iz4fdqm

Bootstrap Learning and Visual Processing Management on Mobile Robots

Mohan Sridharan
2010 Advances in Artificial Intelligence  
The learned models are used to detect and adapt to illumination changes.  ...  This paper summarizes a comprehensive effort towards such bootstrap learning, adaptation, and processing management using visual input.  ...  Acknowledgments The author thanks collaborators from the University of Texas at Austin (Peter Stone) and University of Birmingham (UK) (Jeremy Wyatt, Richard Dearden, and Aaron Sloman).  ... 
doi:10.1155/2010/765876 fatcat:pmrxqj4zrbgk3pohhokgbrrtx4

Online Service Migration in Edge Computing with Incomplete Information: A Deep Recurrent Actor-Critic Method [article]

Jin Wang, Jia Hu, Geyong Min, Qiang Ni, Tarek El-Ghazawi
2022 arXiv   pre-print
The extensive experimental results based on real-world mobility traces demonstrate that this new method consistently outperforms both the heuristic and state-of-the-art learning-driven algorithms and can  ...  To address these challenges, we propose a novel learning-driven method, which is user-centric and can make effective online migration decisions by utilizing incomplete system-level information.  ...  Backgrounds of RL and POMDP Reinforcement learning: RL can solve sequential decisionmaking problems by learning from interaction with the environment.  ... 
arXiv:2012.08679v4 fatcat:qtngy5kzwzbbdbfdpchhvog7qy

Optimal Feature Search for Vigilance Estimation Using Deep Reinforcement Learning

Woojoon Seok, Minsoo Yeo, Jiwoo You, Heejun Lee, Taeheum Cho, Bosun Hwang, Cheolsoo Park
2020 Electronics  
The classification was performed with a small number of features, and the results were similar to those from using all of the features.  ...  In this study, a deep Q-network (DQN) algorithm was designed, using conventional feature engineering and deep convolutional neural network (CNN) methods, to extract the optimal features.  ...  Section 3 elaborates upon the classification results of the reinforcement learning and conventional algorithms. Section 4 discusses the experimental results and concludes the paper.  ... 
doi:10.3390/electronics9010142 fatcat:ld55ltytyfa4vjyij6ae2ndldi

Goal-Directed Online Learning of Predictive Models [chapter]

Sylvie C. W. Ong, Yuri Grinberg, Joelle Pineau
2012 Lecture Notes in Computer Science  
Our algorithm interleaves online learning of the models, with estimation of the value function.  ...  The framework is applicable to a variety of important learning problems, including scenarios such as apprenticeship learning, model customization, and decisionmaking in non-stationary domains.  ...  Funding was provided by the National Institutes of Health (grant R21 DA019800) and the NSERC Discovery Grant program.  ... 
doi:10.1007/978-3-642-29946-9_6 fatcat:bxmawkplczasbbdsmzbofx2xoi

Predictive State Temporal Difference Learning [article]

Byron Boots, Geoffrey J. Gordon
2011 arXiv   pre-print
In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation.  ...  Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features.  ...  Experimental Results We designed several experiments to evaluate the properties of the PSTD learning algorithm.  ... 
arXiv:1011.0041v2 fatcat:ij2terp5nraknalv5s3e3cmks4

Towards a Decentralized, Autonomous Multiagent Framework for Mitigating Crop Loss [article]

Roi Ceren, Shannon Quinn, Glen Raines
2019 arXiv   pre-print
Our goal is to limit the use of the more computationally and temporally expensive subsequent layers.  ...  We introduce a reinforcement learning technique based on Perkins' Monte Carlo Exploring Starts for a generalized Markovian model for each layer's decision problem, and label the system the Agricultural  ...  Since the eventual categorization is used to inform the decisions of higher level layers, we adopt the perspective of reinforcement learning.  ... 
arXiv:1901.02035v1 fatcat:hxk6nobj3nczvaoa2hum3lmzvu

Deep Q-Network with Predictive State Models in Partially Observable Domains

Danning Yu, Kun Ni, Yunlong Liu
2020 Mathematical Problems in Engineering  
While deep reinforcement learning (DRL) has achieved great success in some large domains, most of the related algorithms assume that the state of the underlying system is fully observable.  ...  We use a recurrent network to establish the recurrent PSR model, which can fully learn dynamics of the partially continuous observable environment.  ...  Acknowledgments is work was supported by the National Natural Science Foundation of China (nos. 61772438 and 61375077). is work was also supported by the China Scholarship Council (201906315049).  ... 
doi:10.1155/2020/1596385 fatcat:bkjtb72omzaxrlcsl6ktzwtvza

Color learning and illumination invariance on mobile robots: A survey

Mohan Sridharan, Peter Stone
2009 Robotics and Autonomous Systems  
A major challenge to the widespread deployment of mobile robots is the ability to function autonomously, learning useful models of environmental features, recognizing environmental changes, and adapting  ...  This article focuses on such learning and adaptation in the context of color segmentation on mobile robots in the presence of illumination changes.  ...  Acknowledgments The authors thank the members of the UTAustinVilla team who contributed part of the code used to test our algorithms.  ... 
doi:10.1016/j.robot.2009.01.004 fatcat:v67nng6jovhzrafnrqmcn5fqou

Learning Predictive State Representations From Non-Uniform Sampling

Yuri Grinberg, Hossein Aboutalebi, Melanie Lyman-Abramovitch, Borja Balle, Doina Precup
Then, we address the core shortcoming of existing PSR spectral learning methods for conditional models by incorporating an additional step in the process, which can be seen as a type of matrix denoising  ...  This can have negative consequences on the PSR parameter estimation process, which are not taken into account by the current state-of-the-art PSR spectral learning algorithms.  ...  Note that this approach is of independent interest to spectral learning methods in general. In the experimental section we will highlight the benefits of those penalties.  ... 
doi:10.1609/aaai.v32i1.11744 fatcat:vrcpb4ekarhhjmoiy7jqwfcz5q

Dynamic Multichannel Sensing in Cognitive Radio: Hierarchical Reinforcement Learning

Shuai Liu, Jiayun Wu, Jing He
2021 IEEE Access  
Efficient use of spectral resources is critical in wireless networks and has been extensively studied in recent years.  ...  The proposed approach divides the original problem into separate "sub problems", each of which is solved using its own reinforcement learning agent.  ...  Using the typical model of reinforcement learning, POMDP effectively utilizes the fact that cognitive users cannot fully perceive all channels when performing.  ... 
doi:10.1109/access.2021.3056670 fatcat:xnxm7q4ni5ac3l6cpt4eljbkcy
« Previous Showing results 1 — 15 out of 236 results