A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies
[article]
2021
arXiv
pre-print
We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. ...
The baseline policy can arise from demonstration data or a teacher agent and may provide useful cues for learning, but it might also be sub-optimal for the task at hand, and is not guaranteed to satisfy ...
Acknowledgements The authors would like to thank members of the Princeton NLP Group, the anonymous reviewers, and the area chair for their comments. ...
arXiv:2006.11645v3
fatcat:ngboxu47t5fm3gojgnym6yqala
On the Effectiveness of Iterative Learning Control
[article]
2021
arXiv
pre-print
Iterative learning control (ILC) is a powerful technique for high performance tracking in the presence of modeling errors for optimal control applications. ...
However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often ...
Hoffmann, editors, Machine Learning, Proceedings of
the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney,
Australia, July 8-12, 2002, pages 267–274. ...
arXiv:2111.09434v3
fatcat:lwnqcrx4wneiddqrgwq2dpke4q
Cost-to-Go Function Approximation
[chapter]
2017
Encyclopedia of Machine Learning and Data Mining
It maintains a set, S , of most specific hypotheses that are consistent with the training data and a set, G, of most general hypotheses consistent with the training data. ...
These two sets form two boundaries on the version space. ...
Some algorithms, most notably CN2 (Clark and Niblett 1989; Clark and Boswell 1991) , learn multi-class rules directly by optimizing overall possible classes in the head of the rule. ...
doi:10.1007/978-1-4899-7687-1_100093
fatcat:vse7ncdqs5atlosjhz7fhlj3im
Gradient play in stochastic games: stationary points, convergence, and sample complexity
[article]
2021
arXiv
pre-print
Our result shows that the number of iterations to reach an ϵ-NE scales linearly, instead of exponentially, with the number of agents. ...
learning algorithm and give a non-asymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm. ...
.), Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8-12, 2002, pp. 267-274. Morgan Kaufmann, 2002. ...
arXiv:2106.00198v4
fatcat:odcv6fhgkjdbdcastxlbc5m6bu
Primary Encoder: Laura Weakly Designer: Karin Dalziel Encoders and Proofreaders: International Program Committee Local Organizing Committee Conference Volunteers Welcome to Digital Humanities 2013
2013
unpublished
Welcome to the University of Nebraska-Lincoln and to Digital Humanities 2013. The theme we have chosen for this year's conference is "Freedom to Explore." ...
Indian tribes in Nebraska today are the Omaha, Ponca, Dakota Sioux and the Winnebago, with other tribes having been relocated to reservations in Oklahoma or South Dakota during the nineteenth-century. ...
International Program Committee
Acknowledgements The University of Queensland is proud to be in partnership with the National eResearch Collaboration Tools and Resources (NeCTAR) project to create a ...
fatcat:bibtuxjcgzdtpgda6kjat2m5hu