A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
[article]
2020
arXiv
pre-print
Different from reinforcement learning, GAIL learns both policy and reward function from expert (human) demonstration. ...
Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks. ...
To address such issues of IRL, Ho and Ermon (2016) propose generative adversarial imitation learning (GAIL), which searches for the optimal policy without fully solving an RL subproblem given a reward ...
arXiv:2003.03709v2
fatcat:7tk7xyjy6fbv5plxl5jttbqifq
On Instrumental Variable Regression for Deep Offline Policy Evaluation
[article]
2021
arXiv
pre-print
We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding ...
We find empirically that state-of-the-art OPE methods are closely matched in performance by some IV methods such as AGMM, which were not developed for OPE. ...
Background
Reinforcement learning and offline policy evaluation Reinforcement learning considers a Markov decision process S, A, P, R, µ 0 , γ , where S is the state space, A is the action space, and ...
arXiv:2105.10148v1
fatcat:ssz5p76qj5hdhjlwkjsp6fnove
Deep Learning Techniques for Music Generation – A Survey
[article]
2019
arXiv
pre-print
This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. ...
Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? ...
A recent combination of reinforcement learning (more specifically Q-learning) and deep learning, named deep reinforcement learning, has been proposed [122] in order to make learning more efficient. ...
arXiv:1709.01620v4
fatcat:hma4znleorfpvh62cpupxu4fq4
Learning Decisions: Robustness, Uncertainty, and Appoximation
2018
Decision making under uncertainty is a central problem in robotics and machine learning. This thesis explores three fundamental and intertwined aspects of the problem of learning to make decisions. ...
Finally, we provide case studies that serve as both motivation for the techniques as well as illustrate their applicability. ...
I can hope for the latter, but I am in debt to too many and too deeply for the former. ...
doi:10.1184/r1/6555335
fatcat:d66do42kaffhbm3u3s425w7tuy
Local planning for continuous Markov decision processes
2014
A general formulation of this problem is in terms of reinforcement learning (RL), which has traditionally been restricted to small discrete domains. ...
By developing planners that function natively in continuous domains, difficult decisions related to how coarsely to discretize the problem are avoided, which allows for more flexible and efficient algorithms ...
Reinforcement learning in MDPs is concerned with finding a good policy π(s) → a for M. ...
doi:10.7282/t3br8q83
fatcat:q276d2krmzhpjgwcniss7t7sx4
Report from Dagstuhl Seminar Artificial and Computational Intelligence in Games: AI-Driven Game Design 1 Executive Summary
Computational Intelligence in
unpublished
To this end, the seminar included a wide range of researchers and developers, including specialists in AI/CI for abstract games, commercial video games, and serious games. ...
Such techniques include procedural content generation, automated narration, player modelling and adaptation, and automated game design. ...
From a Tech/programmer for a major games service provider: "How reinforcement learning agents can be applied for testing across multiple games?" ...
fatcat:auglacwl4vfttb65kjrmljsgbi
The Vessel Schedule Recovery Problem (VSRP) – A MIP model for handling disruptions in liner shipping
2013
European Journal of Operational Research
, and the travel paths for users between each pair of origins and destinations. ...
for the optimization of plants and entire supply chains that are involved in EWO problems. ...
for the search. ...
doi:10.1016/j.ejor.2012.08.016
fatcat:c27kagfnxnhjfbil2rydhjhomm
Approximate Solutions to Markov Decision Processes
2018
One of the basic problems of machine learning is deciding how to act in an uncertain world. ...
One representation for a learner's environment and goals is a Markov decision process or MDP. ...
Thanks in particular to my advisor Tom Mitchell and to Andrew Moore for helping me to see both the forest and the trees, and to Tom Mitchell for nding the funding to let me work on the interesting problems ...
doi:10.1184/r1/6551972.v1
fatcat:taciy3ayvnbehml532tdhhyqtu
Dagstuhl Reports, Volume 9, Issue 12, December 2019, Complete Issue
2020
AI for Accessibility in Games Tommy Thompson ...
A Tour of Reinforcement Learning: The View from Continuous Control. ...
In contrast, there has been very little progress on this kind of problem in the machine learning and reinforcement learning community. ...
doi:10.4230/dagrep.9.12
fatcat:hebigxkvinhjdb6qlg3j5hw25u
Dagstuhl Reports, Volume 7, Issue 11, November 2017, Complete Issue
[article]
2018
A General Language for Matching Tile Games Julian Togelius (New York University, US), Cameron Browne (RIKEN -Tokyo, JP), Simon Colton (Falmouth University, GB), Mark J. ...
From a Tech/programmer for a major games service provider: "How reinforcement learning agents can be applied for testing across multiple games?" ...
Popular examples include Bejeweled, Tetris, and Candy Crush Saga. ...
doi:10.4230/dagrep.7.11
fatcat:pk2gs776vzftffmrue3j2xdgoy