1,521 Hits in 3.7 sec

A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees [article]

Deepan Muthirayan, Pramod Khargonekar
2022 arXiv   pre-print
In this work we provide provable regret guarantees for an online meta-learning control algorithm in an iterative control setting, where in each iteration the system to be controlled is a linear deterministic  ...  We prove (i) that the algorithm achieves a regret for the controller cost and constraint violation that are O(T^3/4) for an episode of duration T with respect to the best policy that satisfies the control  ...  We emphasize that this is the first work that provides provable guarantees for the regret for the performance of an online meta-learning control algorithm in a suitable control setting.  ... 
arXiv:2008.13265v6 fatcat:3tfzkfqrcnhtfnoapjtzc2yd34

Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach [article]

Xiaowu Sun, Wael Fatnassi, Ulices Santa Cruz, Yasser Shoukry
2021 arXiv   pre-print
Our approach is to learn a set of NN controllers during the training phase.  ...  While conventional reinforcement learning focuses on designing agents that can perform one task, meta-learning aims, instead, to solve the problem of designing agents that can generalize to different tasks  ...  During training, the learning algorithm is augmented with a NN weight projection operator that enforces the resulting NN to be provably safe.  ... 
arXiv:2109.01255v1 fatcat:zj7ffgkrqjawnom2qbqdgijvqi

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments [article]

Anirudha Majumdar, Alec Farid, Anoopkumar Sonar
2020 arXiv   pre-print
Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments.  ...  We propose policy learning algorithms that explicitly seek to minimize this upper bound.  ...  We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.  ... 
arXiv:1806.04225v5 fatcat:dsqjkqu4ubfobolbofgfk3mv4m

Safely Bridging Offline and Online Reinforcement Learning [article]

Wanqiao Xu, Kan Xu, Hamsa Bastani, Osbert Bastani
2021 arXiv   pre-print
We then design an algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it as needed to ensure safety with high probability.  ...  We experimentally validate our results on a sepsis treatment task, demonstrating that our algorithm can learn while ensuring good performance compared to the baseline policy for every patient.  ...  First, we consider a singleproduct stochastic inventory control problem based on [15] , but with a finite horizon.  ... 
arXiv:2110.13060v1 fatcat:dsi63u4tgbbadlxkfh466zyaaa

Meta-Learning Guarantees for Online Receding Horizon Learning Control [article]

Deepan Muthirayan, Pramod P. Khargonekar
2022 arXiv   pre-print
In this paper we provide provable regret guarantees for an online meta-learning receding horizon control algorithm in an iterative control setting.  ...  Thus, we show that the worst regret for learning within an iteration improves with experience of more iterations, with guarantee on rate of improvement.  ...  Structure of the Meta-Learning Control Algorithm In this section we propose a model-based meta-learning receding horizon control algorithm for the learning setting described above.  ... 
arXiv:2010.11327v14 fatcat:asbudhtn2bex7bqkbetkb4z3ki

Provable Guarantees for Gradient-Based Meta-Learning [article]

Mikhail Khodak, Maria-Florina Balcan, Ameet Talwalkar
2019 arXiv   pre-print
We study the problem of meta-learning through the lens of online convex optimization, developing a meta-algorithm bridging the gap between popular gradient-based meta-learning and classical regularization-based  ...  Our method is the first to simultaneously satisfy good sample efficiency guarantees in the convex setting, with generalization bounds that improve with task-similarity, while also being computationally  ...  In contrast to both results, we show finite-sample learning-theoretic guarantees for convex functions under a natural task-similarity assumption.  ... 
arXiv:1902.10644v2 fatcat:f6xxnrpehrhm3i75askmaxhkgm

Online Robust Control of Nonlinear Systems with Large Uncertainty [article]

Dimitar Ho, Hoang M. Le, John C. Doyle, Yisong Yue
2021 arXiv   pre-print
We provide a learning convergence analysis that yields a finite mistake bound on the number of times performance requirements are not met and can provide strong safety guarantees, by bounding the worst-case  ...  Robust control is a core approach for controlling systems with performance guarantees that are robust to modeling error, and is widely used in real-world systems.  ...  Combined with a suitable oracle π, the resulting online control algorithm A π (SEL p ) provides finite mistake guarantees for objectives G according to Theorem 2.10.  ... 
arXiv:2103.11055v2 fatcat:ofakkysmtrdydg44nwqfm6g4ya

On learning with imperfect representations

Shivaram Kalyanakrishnan, Peter Stone
2011 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)  
Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice.  ...  We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis.  ...  This work has taken place in the Learning Agents Research Group (LARG) at the University of Texas at Austin.  ... 
doi:10.1109/adprl.2011.5967379 dblp:conf/adprl/KalyanakrishnanS11 fatcat:xzjsuobg4ndr5lb5luyl7a73ja

Gödel Machines: Towards a Technical Justification of Consciousness [chapter]

Jürgen Schmidhuber
2005 Lecture Notes in Computer Science  
Their initial algorithm is not hardwired; it can completely rewrite itself, but only if a proof searcher embedded within the initial algorithm can first prove that the rewrite is useful, given a formalized  ...  utility function reflecting computation time and expected future success (e.g., rewards).  ...  Q (related to the previous item): The Gödel machine implements a meta-learning behavior: what about a meta-meta, and a meta-meta-meta level?  ... 
doi:10.1007/978-3-540-32274-0_1 fatcat:twyyq4cqmzeednza23ilvqu2tm

Bayesian sparse sampling for on-line reward optimization

Tao Wang, Daniel Lizotte, Michael Bowling, Dale Schuurmans
2005 Proceedings of the 22nd international conference on Machine learning - ICML '05  
Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost.  ...  The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios.  ...  Acknowledgments Research supported by the Alberta Ingenuity Centre for Machine Learning, CRC, NSERC, MITACS and CFI.  ... 
doi:10.1145/1102351.1102472 dblp:conf/icml/WangLBS05 fatcat:pwpcdttqufbttklck3qpfxbsua

Uniform Convergence Bounds for Codec Selection [article]

Clayton Sanford, Cyrus Cousins, Eli Upfal
2018 arXiv   pre-print
We frame the problem of selecting an optimal audio encoding scheme as a supervised learning task.  ...  Through uniform convergence theory, we guarantee approximately optimal codec selection while controlling for selection bias.  ...  We also introduce the Progressive Sampling with Pruning (PSP) algorithm, which adaptively prunes provably suboptimal codecs over time, and terminates when a desired approximation threshold is met, thus  ... 
arXiv:1812.07568v1 fatcat:lh4f7rbh2fd2jn4rvvttsj3rdi

Actively Learning to Verify Safety for FIFO Automata [chapter]

Abhay Vardhan, Koushik Sen, Mahesh Viswanathan, Gul Agha
2004 Lecture Notes in Computer Science  
We apply machine learning techniques to verify safety properties of finite state machines which communicate over unbounded FIFO channels.  ...  We define a new encoding scheme for representing reachable states and their witness execution; this enables the learning algorithm to analyze a larger class of FIFO systems automatically than a naive encoding  ...  Angluin's algorithm is guaranteed to terminate in polynomial time with the minimal DFA representing the target set.  ... 
doi:10.1007/978-3-540-30538-5_41 fatcat:3slmxwath5hodikxjpwpn367ry

Bayesian Learning-Based Adaptive Control for Safety Critical Systems [article]

David D. Fan, Jennifer Nguyen, Rohan Thakker, Nikhilesh Alatur, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou
2020 arXiv   pre-print
Under reasonable assumptions, we guarantee stability and safety while adapting to unknown dynamics with probability 1.  ...  However, there is a strong reluctance to use these methods on safety-critical systems, which have constraints on safety, stability, and real-time performance.  ...  Evangelos A. Theodorou was supported by the C-STAR Faculty Fellowship at Georgia Institute of Technology. Copyright c 2019. All rights reserved.  ... 
arXiv:1910.02325v2 fatcat:4lwdqymn5vhwjn3qajioc6tdlm

Fairness-Aware Online Meta-learning [article]

Chen Zhao, Feng Chen, Bhavani Thuraisingham
2021 arXiv   pre-print
To overcome such issues and bridge the gap, in this paper for the first time we proposed a novel online meta-learning algorithm, namely FFML, which is under the setting of unfairness prevention.  ...  In contrast to offline working fashions, two research paradigms are devised for online learning: (1) Online Meta Learning (OML) learns good priors over model parameters (or learning to learn) in a sequential  ...  We claim that for the first time a fairness-aware online meta-learning framework is proposed.  ... 
arXiv:2108.09435v1 fatcat:iuwg7ihb7ngv5dleluisquagqa

Life, The Mind, and Everything [article]

Gary R. Prok
2016 arXiv   pre-print
This is an effort to convey these thoughts and results in a somewhat entertaining manner.  ...  Incompleteness theorems of Godel, Turing, Chaitin, and Algorithmic Information Theory have profound epistemological implications.  ...  This section of the genetic algorithm is guaranteed to complete and halt, and allow the algorithm to proceed in a short amount of time to the next step.  ... 
arXiv:1602.07646v2 fatcat:bkdqgbgrejdaxbbnn7wx7jtzhm
« Previous Showing results 1 — 15 out of 1,521 results