A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples
[article]
2021
arXiv
pre-print
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm. ...
We prove that in episodic RL, a finite reward automaton can express any non-Markovian bounded reward functions with finitely many reward values and approximate any non-Markovian bounded reward function ...
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA. ...
arXiv:2006.15714v4
fatcat:6hmukogdlnbernfi36bvl4kwbm
Active Task-Inference-Guided Deep Inverse Reinforcement Learning
[article]
2020
arXiv
pre-print
We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm. ...
At each iteration, the algorithm alternates between two modules, a task inference module that infers the underlying task structure and a reward learning module that uses the inferred task structure to ...
The task inference module utilizes L* learning [33] , an active automaton inference algorithm, as the template to iteratively infer a DFA from queries and counterexamples. ...
arXiv:2001.09227v3
fatcat:qkall4h7d5ej7mbig5itawlitu
Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning
[article]
2022
arXiv
pre-print
The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. ...
In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. ...
SA2020071003-V0237, and the Air Force Office of Scientific Research through Award 20RICOR012, and the National Science Foundation through CAREER Award CCF-1552497 and Award CCF-2106339. ...
arXiv:2107.04633v2
fatcat:k4apdkrl4bcnvn34fj7v4vpef4
Reinforcement Learning with Non-Markovian Rewards
[article]
2019
arXiv
pre-print
Here, we address the problem of policy learning from experience with such rewards. ...
We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. ...
Acknowledgements We thank the reviewers for their useful comments. ...
arXiv:1912.02552v1
fatcat:umncgaz3qvdv7dbvnzju54dghu
Reinforcement Learning with Non-Markovian Rewards
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Here, we address the problem of policy learning from experience with such rewards. ...
We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. ...
Acknowledgements We thank the reviewers for their useful comments. ...
doi:10.1609/aaai.v34i04.5814
fatcat:labzyux7wzcnbje54dp6o7k6x4
Online Learning of Non-Markovian Reward Models
[article]
2020
arXiv
pre-print
One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences. ...
Our approach to overcome this challenge is to use Angluin's L^* active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). ...
[4] showed that finite automata can be learned using the so-called membership and equivalence queries. ...
arXiv:2009.12600v2
fatcat:yoeimnqmkvan3akl3kxajbuqme
Learning Non-Markovian Reward Models in MDPs
[article]
2020
arXiv
pre-print
Our approach to overcome this challenging problem is a careful combination of the Angluin's L* active learning algorithm to learn finite automata, testing techniques for establishing conformance of finite ...
One natural and quite general way to represent history-dependent rewards is via a Mealy machine; a finite state automaton that produces output sequences (rewards in our case) from input sequences (state ...
Angluin [3] showed that finite automata can be learned using the so-called membership and equivalence queries. ...
arXiv:2001.09293v1
fatcat:ziawcsiu4jgyrjvf4c3ywf4pcm
Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes
2021
The Journal of Artificial Intelligence Research
We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller ...
Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. ...
Acknowledgements Steven Carr and Ufuk Topcu were supported by the grants DARPA D19AP00004, ONR N00014-18-1-2829, and ARL ACC-APG-RTP W911NF. Nils Jansen was supported by the grant NWO OCENW. ...
doi:10.1613/jair.1.12963
fatcat:usbrnbs6dvarrbnj2x4bmmmrwa
Induction and Exploitation of Subgoal Automata for Reinforcement Learning
2021
The Journal of Artificial Intelligence Research
In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ...
We evaluate ISA in several gridworld and continuous state space problems using different RL algorithms that leverage the automaton structures. ...
Anders Jonsson is partially supported by the Spanish grants PCIN-2017-082 and PID2019-108141GB-I00. ...
doi:10.1613/jair.1.12372
fatcat:yxtefk4vbjam7mtmtbs4bo2lcq
Reinforcement Learning under Partial Observability Guided by Learned Environment Models
[article]
2022
arXiv
pre-print
Therefore, we propose an approach for reinforcement learning (RL) in partially observable environments. ...
In our evaluation, we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory. ...
This work has been supported by the "University SAL Labs" initiative of Silicon Austria Labs (SAL) and its Austrian partner universities for applied fundamental research for electronic based systems. ...
arXiv:2206.11708v1
fatcat:hg5n4fqkuvbihozp76cevgoimm
Learning Graph Structure With A Finite-State Automaton Layer
[article]
2020
arXiv
pre-print
We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation ...
Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. ...
We would also like to thank Dibya Ghosh and Yujia Li for their helpful comments and suggestions during the writing process, and the Brain Program Learning, Understanding, and Synthesis team at Google for ...
arXiv:2007.04929v2
fatcat:7hdu7fxaujcx5dcvq5cgd6ybau
Active Learning of Plans for Safety and Reachability Goals With Partial Observability
2010
IEEE Transactions on Systems Man and Cybernetics Part B (Cybernetics)
The planning method first learns a safe plan using the L * algorithm, which is an efficient active learning algorithm for regular languages. ...
Our technique is based on the active learning of regular languages and symbolic model checking. ...
the minimal Deterministic Finite Automaton (DFA) that accepts L(P ). 2) The number of queries is polynomial in the size of P and in the length of the longest counterexample that was obtained while constructing ...
doi:10.1109/tsmcb.2009.2025657
pmid:19661004
fatcat:qfurf7st2jbotjhsqahptmmm64
Verification for Machine Learning, Autonomy, and Neural Networks Survey
[article]
2018
arXiv
pre-print
Autonomy in CPS is enabling by recent advances in artificial intelligence (AI) and machine learning (ML) through approaches such as deep neural networks (DNNs), embedded in so-called learning enabled components ...
Recently, the formal methods and formal verification community has developed methods to characterize behaviors in these LECs with eventual goals of formally verifying specifications for LECs, and this ...
Classifier-learning techniques use both positive and negative examples for STL formula learning and active-learning techniques experiment on the system to extract counterexamples. ...
arXiv:1810.01989v1
fatcat:a5ax66lsxbho3fuxuh55ypnm6m
Learning Performance Graphs from Demonstrations via Task-Based Evaluations
[article]
2022
arXiv
pre-print
The main contribution of this paper is an algorithm to learn the performance graph directly from the user-provided demonstrations, and show that the reward functions generated using the learned performance ...
Without this knowledge, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. ...
Such cumulative rewards can then be used by a reinforcement learning (RL) procedure to learn an optimal policy. ...
arXiv:2204.05909v1
fatcat:sv5fu3u5ebaerl2celk5uo2dki
Average-Payoff Reinforcement Learning
[chapter]
2017
Encyclopedia of Machine Learning and Data Mining
Average-Reward Reinforcement
Learning ...
The general approach behind segmentation-based techniques is to segment the normal time series and treat each segment as a state in a finite-state automaton (FSA) and then use the FSA to determine if a ...
Average-Cost Neuro-Dynamic Programming
Average-Reward Reinforcement Learning
Average-Cost Optimization Average-Reward Reinforcement Learning
Averaged One-Dependence Estimators Fei Zheng 1;2 and Geoffrey ...
doi:10.1007/978-1-4899-7687-1_100029
fatcat:jub4ulyg45abnf4qgutimczie4
« Previous
Showing results 1 — 15 out of 53 results