Filters








53 Hits in 5.9 sec

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples [article]

Zhe Xu, Bo Wu, Aditya Ojha, Daniel Neider, Ufuk Topcu
2021 arXiv   pre-print
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.  ...  We prove that in episodic RL, a finite reward automaton can express any non-Markovian bounded reward functions with finitely many reward values and approximate any non-Markovian bounded reward function  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA.  ... 
arXiv:2006.15714v4 fatcat:6hmukogdlnbernfi36bvl4kwbm

Active Task-Inference-Guided Deep Inverse Reinforcement Learning [article]

Farzan Memarian, Zhe Xu, Bo Wu, Min Wen, Ufuk Topcu
2020 arXiv   pre-print
We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm.  ...  At each iteration, the algorithm alternates between two modules, a task inference module that infers the underlying task structure and a reward learning module that uses the inferred task structure to  ...  The task inference module utilizes L* learning [33] , an active automaton inference algorithm, as the template to iteratively infer a DFA from queries and counterexamples.  ... 
arXiv:2001.09227v3 fatcat:qkall4h7d5ej7mbig5itawlitu

Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning [article]

Taylor Dohmen, Noah Topper, George Atia, Andre Beckus, Ashutosh Trivedi, Alvaro Velasquez
2022 arXiv   pre-print
The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies.  ...  In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning.  ...  SA2020071003-V0237, and the Air Force Office of Scientific Research through Award 20RICOR012, and the National Science Foundation through CAREER Award CCF-1552497 and Award CCF-2106339.  ... 
arXiv:2107.04633v2 fatcat:k4apdkrl4bcnvn34fj7v4vpef4

Reinforcement Learning with Non-Markovian Rewards [article]

Maor Gaon, Ronen I. Brafman
2019 arXiv   pre-print
Here, we address the problem of policy learning from experience with such rewards.  ...  We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR.  ...  Acknowledgements We thank the reviewers for their useful comments.  ... 
arXiv:1912.02552v1 fatcat:umncgaz3qvdv7dbvnzju54dghu

Reinforcement Learning with Non-Markovian Rewards

Maor Gaon, Ronen Brafman
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Here, we address the problem of policy learning from experience with such rewards.  ...  We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR.  ...  Acknowledgements We thank the reviewers for their useful comments.  ... 
doi:10.1609/aaai.v34i04.5814 fatcat:labzyux7wzcnbje54dp6o7k6x4

Online Learning of Non-Markovian Reward Models [article]

Gavin Rens, Jean-François Raskin, Raphaël Reynouad, Giuseppe Marra
2020 arXiv   pre-print
One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences.  ...  Our approach to overcome this challenge is to use Angluin's L^* active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM).  ...  [4] showed that finite automata can be learned using the so-called membership and equivalence queries.  ... 
arXiv:2009.12600v2 fatcat:yoeimnqmkvan3akl3kxajbuqme

Learning Non-Markovian Reward Models in MDPs [article]

Gavin Rens, Jean-François Raskin
2020 arXiv   pre-print
Our approach to overcome this challenging problem is a careful combination of the Angluin's L* active learning algorithm to learn finite automata, testing techniques for establishing conformance of finite  ...  One natural and quite general way to represent history-dependent rewards is via a Mealy machine; a finite state automaton that produces output sequences (rewards in our case) from input sequences (state  ...  Angluin [3] showed that finite automata can be learned using the so-called membership and equivalence queries.  ... 
arXiv:2001.09293v1 fatcat:ziawcsiu4jgyrjvf4c3ywf4pcm

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Steven Carr, Nils Jansen, Ufuk Topcu
2021 The Journal of Artificial Intelligence Research  
We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller  ...  Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification.  ...  Acknowledgements Steven Carr and Ufuk Topcu were supported by the grants DARPA D19AP00004, ONR N00014-18-1-2829, and ARL ACC-APG-RTP W911NF. Nils Jansen was supported by the grant NWO OCENW.  ... 
doi:10.1613/jair.1.12963 fatcat:usbrnbs6dvarrbnj2x4bmmmrwa

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda, Alessandra Russo
2021 The Journal of Artificial Intelligence Research  
In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.  ...  We evaluate ISA in several gridworld and continuous state space problems using different RL algorithms that leverage the automaton structures.  ...  Anders Jonsson is partially supported by the Spanish grants PCIN-2017-082 and PID2019-108141GB-I00.  ... 
doi:10.1613/jair.1.12372 fatcat:yxtefk4vbjam7mtmtbs4bo2lcq

Reinforcement Learning under Partial Observability Guided by Learned Environment Models [article]

Edi Muskardin, Martin Tappler, Bernhard K. Aichernig, Ingo Pill
2022 arXiv   pre-print
Therefore, we propose an approach for reinforcement learning (RL) in partially observable environments.  ...  In our evaluation, we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.  ...  This work has been supported by the "University SAL Labs" initiative of Silicon Austria Labs (SAL) and its Austrian partner universities for applied fundamental research for electronic based systems.  ... 
arXiv:2206.11708v1 fatcat:hg5n4fqkuvbihozp76cevgoimm

Learning Graph Structure With A Finite-State Automaton Layer [article]

Daniel D. Johnson, Hugo Larochelle, Daniel Tarlow
2020 arXiv   pre-print
We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation  ...  Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton.  ...  We would also like to thank Dibya Ghosh and Yujia Li for their helpful comments and suggestions during the writing process, and the Brain Program Learning, Understanding, and Synthesis team at Google for  ... 
arXiv:2007.04929v2 fatcat:7hdu7fxaujcx5dcvq5cgd6ybau

Active Learning of Plans for Safety and Reachability Goals With Partial Observability

Wonhong Nam, R. Alur
2010 IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)  
The planning method first learns a safe plan using the L * algorithm, which is an efficient active learning algorithm for regular languages.  ...  Our technique is based on the active learning of regular languages and symbolic model checking.  ...  the minimal Deterministic Finite Automaton (DFA) that accepts L(P ). 2) The number of queries is polynomial in the size of P and in the length of the longest counterexample that was obtained while constructing  ... 
doi:10.1109/tsmcb.2009.2025657 pmid:19661004 fatcat:qfurf7st2jbotjhsqahptmmm64

Verification for Machine Learning, Autonomy, and Neural Networks Survey [article]

Weiming Xiang and Patrick Musau and Ayana A. Wild and Diego Manzanas Lopez and Nathaniel Hamilton and Xiaodong Yang and Joel Rosenfeld and Taylor T. Johnson
2018 arXiv   pre-print
Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components  ...  Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this  ...  Classifier-learning techniques use both positive and negative examples for STL formula learning and active-learning techniques experiment on the system to extract counterexamples.  ... 
arXiv:1810.01989v1 fatcat:a5ax66lsxbho3fuxuh55ypnm6m

Learning Performance Graphs from Demonstrations via Task-Based Evaluations [article]

Aniruddh G. Puranic, Jyotirmoy V. Deshmukh, Stefanos Nikolaidis
2022 arXiv   pre-print
The main contribution of this paper is an algorithm to learn the performance graph directly from the user-provided demonstrations, and show that the reward functions generated using the learned performance  ...  Without this knowledge, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies.  ...  Such cumulative rewards can then be used by a reinforcement learning (RL) procedure to learn an optimal policy.  ... 
arXiv:2204.05909v1 fatcat:sv5fu3u5ebaerl2celk5uo2dki

Average-Payoff Reinforcement Learning [chapter]

2017 Encyclopedia of Machine Learning and Data Mining  
Average-Reward Reinforcement Learning  ...  The general approach behind segmentation-based techniques is to segment the normal time series and treat each segment as a state in a finite-state automaton (FSA) and then use the FSA to determine if a  ...  Average-Cost Neuro-Dynamic Programming Average-Reward Reinforcement Learning Average-Cost Optimization Average-Reward Reinforcement Learning Averaged One-Dependence Estimators Fei Zheng 1;2 and Geoffrey  ... 
doi:10.1007/978-1-4899-7687-1_100029 fatcat:jub4ulyg45abnf4qgutimczie4
« Previous Showing results 1 — 15 out of 53 results