Filters








5 Hits in 9.8 sec

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies [article]

Tsung-Yen Yang and Justinian Rosca and Karthik Narasimhan and Peter J. Ramadge
2021 arXiv   pre-print
We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy.  ...  The baseline policy can arise from demonstration data or a teacher agent and may provide useful cues for learning, but it might also be sub-optimal for the task at hand, and is not guaranteed to satisfy  ...  Acknowledgements The authors would like to thank members of the Princeton NLP Group, the anonymous reviewers, and the area chair for their comments.  ... 
arXiv:2006.11645v3 fatcat:ngboxu47t5fm3gojgnym6yqala

On the Effectiveness of Iterative Learning Control [article]

Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell
2021 arXiv   pre-print
Iterative learning control (ILC) is a powerful technique for high performance tracking in the presence of modeling errors for optimal control applications.  ...  However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often  ...  Hoffmann, editors, Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8-12, 2002, pages 267–274.  ... 
arXiv:2111.09434v3 fatcat:lwnqcrx4wneiddqrgwq2dpke4q

Cost-to-Go Function Approximation [chapter]

2017 Encyclopedia of Machine Learning and Data Mining  
It maintains a set, S , of most specific hypotheses that are consistent with the training data and a set, G, of most general hypotheses consistent with the training data.  ...  These two sets form two boundaries on the version space.  ...  Some algorithms, most notably CN2 (Clark and Niblett 1989; Clark and Boswell 1991) , learn multi-class rules directly by optimizing overall possible classes in the head of the rule.  ... 
doi:10.1007/978-1-4899-7687-1_100093 fatcat:vse7ncdqs5atlosjhz7fhlj3im

Gradient play in stochastic games: stationary points, convergence, and sample complexity [article]

Runyu Zhang, Zhaolin Ren, Na Li
2021 arXiv   pre-print
Our result shows that the number of iterations to reach an ϵ-NE scales linearly, instead of exponentially, with the number of agents.  ...  learning algorithm and give a non-asymptotic global convergence rate analysis for both exact gradient play and our sample-based learning algorithm.  ...  .), Machine Learning, Proceedings of the Nineteenth International Conference (ICML 2002), University of New South Wales, Sydney, Australia, July 8-12, 2002, pp. 267-274. Morgan Kaufmann, 2002.  ... 
arXiv:2106.00198v4 fatcat:odcv6fhgkjdbdcastxlbc5m6bu

Primary Encoder: Laura Weakly Designer: Karin Dalziel Encoders and Proofreaders: International Program Committee Local Organizing Committee Conference Volunteers Welcome to Digital Humanities 2013

Matt Bosley, Matthew Lavin, Elizabeth Lorang, Keith Nickum, Erin Pedigo, Hannah Vahle, Bethany Nowviskie, Craig Bellamy, John Bradley, Paul Caton, Carolyn Guertin, Ian Johnson (+61 others)
2013 unpublished
Welcome to the University of Nebraska-Lincoln and to Digital Humanities 2013. The theme we have chosen for this year's conference is "Freedom to Explore."  ...  Indian tribes in Nebraska today are the Omaha, Ponca, Dakota Sioux and the Winnebago, with other tribes having been relocated to reservations in Oklahoma or South Dakota during the nineteenth-century.  ...  International Program Committee Acknowledgements The University of Queensland is proud to be in partnership with the National eResearch Collaboration Tools and Resources (NeCTAR) project to create a  ... 
fatcat:bibtuxjcgzdtpgda6kjat2m5hu