Filters








39 Hits in 4.5 sec

Balancing Exploration for Online Receding Horizon Learning Control with Provable Regret Guarantees [article]

Deepan Muthirayan, Jianjun Yuan, Pramod P. Khargonekar
2021 arXiv   pre-print
We address the problem of simultaneously learning and control in an online receding horizon control setting.  ...  We propose a novel approach to explore in an online receding horizon setting. The key challenge is to ensure that the control generated by the receding horizon controller is persistently exciting.  ...  Balancing Exploration for Online Receding Horizon Learning Control with Provable Regret Guarantees arXiv:2010.07269v13  ... 
arXiv:2010.07269v14 fatcat:4xi7n4y5dnbnjmqafnarx6o3xy

Online Learning for Predictive Control with Provable Regret Guarantees [article]

Deepan Muthirayan, Jianjun Yuan, Dileep Kalathil, Pramod P. Khargonekar
2022 arXiv   pre-print
Specifically, we study the online learning problem where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview  ...  We study the problem of online learning in predictive control of an unknown linear dynamical system with time varying cost functions which are unknown apriori.  ...  In sharp contrast to these existing works, our goal is to develop an online learning MPC algorithm with provable finite time performance guarantees.  ... 
arXiv:2111.15041v2 fatcat:he43ygi3bjegzj2we7si2btdg4

Meta-Learning Guarantees for Online Receding Horizon Learning Control [article]

Deepan Muthirayan, Pramod P. Khargonekar
2022 arXiv   pre-print
In this paper we provide provable regret guarantees for an online meta-learning receding horizon control algorithm in an iterative control setting.  ...  By analysing conditions under which sub-linear regret is achievable, we prove that the meta-learning online receding horizon controller achieves an average of the dynamic regret for the controller cost  ...  For this setting we propose and study an online model-based meta-learning Receding Horizon Control (RHC) algorithm.  ... 
arXiv:2010.11327v14 fatcat:asbudhtn2bex7bqkbetkb4z3ki

Online Learning Robust Control of Nonlinear Dynamical Systems [article]

Deepan Muthirayan, Pramod P. Khargonekar
2021 arXiv   pre-print
We propose an online controller and present guarantees for the metric R^p_t when the maximum possible attenuation is given by γ, which is a system constant.  ...  We also characterize the lower bound on the required prediction horizon for these guarantees to hold in terms of the system constants.  ...  We use a receding horizon control approach that minimizes the cost-to-go for a horizon M with the previewed disturbances and cost functions.  ... 
arXiv:2106.04092v1 fatcat:eswglw6apvbvvdf3zswoden7bq

Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis [article]

Yingying Li, Xin Chen, Na Li
2019 arXiv   pre-print
We design online algorithms, Receding Horizon Gradient-based Control (RHGC), that utilize the predictions through finite steps of gradient computations.  ...  This paper studies the online optimal control problem with time-varying convex stage costs for a time-invariant linear dynamical system, where a finite lookahead window of accurate predictions of the stage  ...  In this paper, we propose novel gradient-based online control algorithms, receding horizon gradientbased control (RHGC), and provide nonasymptotic optimality guarantees by dynamic regrets.  ... 
arXiv:1906.11378v3 fatcat:3cjsuax45zcxzarr7n7neykfaa

Value Directed Exploration in Multi-Armed Bandits with Structured Priors [article]

Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml
2017 arXiv   pre-print
In this paper, we propose an algorithm for Bayesian multi-armed bandits that utilizes value-function-driven online planning techniques.  ...  The algorithm enjoys a sub-linear performance guarantee and we present simulation results that confirm its strength in problems with structured priors.  ...  This is an instance of receding horizon control, a common approach to solving online planning and reinforcement learning problems [Sutton and Barto, 2016] .  ... 
arXiv:1704.03926v2 fatcat:6ltuelyqyvdtjofwzx7vmtjeem

Online Convex Optimization Using Predictions

Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, Lachlan L.H. Andrew
2015 Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems - SIGMETRICS '15  
prediction error models it is possible to use Averaging Fixed Horizon Control (AFHC) to simultaneously achieve sublinear regret and constant competitive ratio in expectation using only a constant-sized  ...  We prove that achieving sublinear regret and constant competitive ratio for online algorithms requires the use of an unbounded prediction window in adversarial settings, but that under more realistic stochastic  ...  Given the focus of this paper on predictions, the most natural choice of an algorithm to consider is Receding Horizon Control (RHC), a.k.a., Model Predictive Control (MPC).  ... 
doi:10.1145/2745844.2745854 dblp:conf/sigmetrics/ChenAWBA15 fatcat:shjdpuchnncbfdvwz2vogzox5e

Online Convex Optimization Using Predictions

Niangjun Chen, Anish Agarwal, Adam Wierman, Siddharth Barman, Lachlan L.H. Andrew
2015 Performance Evaluation Review  
prediction error models it is possible to use Averaging Fixed Horizon Control (AFHC) to simultaneously achieve sublinear regret and constant competitive ratio in expectation using only a constant-sized  ...  We prove that achieving sublinear regret and constant competitive ratio for online algorithms requires the use of an unbounded prediction window in adversarial settings, but that under more realistic stochastic  ...  Given the focus of this paper on predictions, the most natural choice of an algorithm to consider is Receding Horizon Control (RHC), a.k.a., Model Predictive Control (MPC).  ... 
doi:10.1145/2796314.2745854 fatcat:zd5fxkzcdvbcbfc66aa5twv534

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [article]

Sebastian Curi, Felix Berkenkamp, Andreas Krause
2020 arXiv   pre-print
Furthermore, we analyze H-UCRL and construct a general regret bound for well-calibrated models, which is provably sublinear in the case of Gaussian Process models.  ...  Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods.  ...  Acknowledgments and Disclosure of Funding This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation program grant agreement  ... 
arXiv:2006.08684v3 fatcat:5uyz3uxicbf7fkmesv6rdkg6im

An Online-Learning Approach to Inverse Optimization [article]

Andreas Bärmann and Alexander Martin and Sebastian Pokutta and Oskar Schneider
2020 arXiv   pre-print
Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle.  ...  When applied to the stochastic offline case, our algorithms are able to give guarantees on the quality of the learned objectives in expectation.  ...  Acknowledgements This research was partially supported by NSF CAREER award CMMI-1452463 and by the Bavarian Ministry of Economic Affairs, Regional Development and Energy through the Center for Analytics  ... 
arXiv:1810.12997v2 fatcat:ksq3hk5eqzaz3pinf3ac4y7yea

FormulaZero: Distributionally Robust Online Adaptation via Offline Population Synthesis [article]

Aman Sinha, Matthew O'Kelly, Hongrui Zheng, Rahul Mangharam, John Duchi, Russ Tedrake
2020 arXiv   pre-print
Current approaches either make simplifying assumptions about other agents or lack robust mechanisms for online adaptation. This work makes algorithmic contributions to both challenges.  ...  In particular, autonomous racing is a domain that penalizes safe but conservative policies, highlighting the need for robust, adaptive strategies.  ...  [2] establish regret bounds for adaptive control methods applied to LTI systems, tightening the relationship to online learning.  ... 
arXiv:2003.03900v2 fatcat:xaxnxeensnhyliht7yrjti6amy

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator [article]

Maryam Fazel, Rong Ge, Sham M. Kakade, Mehran Mesbahi
2019 arXiv   pre-print
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the  ...  In contrast, system identification and model based planning in optimal control theory have a much more solid theoretical footing, where much is known with regards to their computational and statistical  ...  K. thanks Emo Todorov, Aravind Rajeswaran, Kendall Lowrey, Sanjeev Arora, and Elad Hazan for helpful discussions. S. K. and M. F. also thank Ben Recht for helpful discussions. R.  ... 
arXiv:1801.05039v3 fatcat:hf7gpybbxnfkrbuhzladrgvkby

Reinforcement Learning: A Survey

L. P. Kaelbling, M. L. Littman, A. W. Moore
1996 The Journal of Artificial Intelligence Research  
It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.  ...  Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.  ...  Acknowledgements Thanks to Marco Dorigo and three anonymous reviewers for comments that have helped to improve this paper.  ... 
doi:10.1613/jair.301 fatcat:nbo23vmu6rfz3ctpjbk7sdcnt4

Reinforcement Learning: A Survey [article]

L. P. Kaelbling, M. L. Littman, A. W. Moore
1996 arXiv   pre-print
It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.  ...  Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment.  ...  Acknowledgements Thanks to Marco Dorigo and three anonymous reviewers for comments that have helped to improve this paper.  ... 
arXiv:cs/9605103v1 fatcat:ze737h6wnfdhjf52hiz4gpogxq

Online Optimization with Predictions and Non-convex Losses [article]

Yiheng Lin, Gautam Goel, Adam Wierman
2020 arXiv   pre-print
In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC  ...  Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs.  ...  CONCLUDING REMARKS In this paper we have studied the problem of online optimization with movement costs and, for the first time, provided algorithms with provable guarantees for the case when the hitting  ... 
arXiv:1911.03827v2 fatcat:m5dmpievhvfkfiqkuwwk3zoxoe
« Previous Showing results 1 — 15 out of 39 results