10 Hits in 4.2 sec

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate [article]

Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang
2020 arXiv   pre-print
Different from reinforcement learning, GAIL learns both policy and reward function from expert (human) demonstration.  ...  Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.  ...  To address such issues of IRL, Ho and Ermon (2016) propose generative adversarial imitation learning (GAIL), which searches for the optimal policy without fully solving an RL subproblem given a reward  ... 
arXiv:2003.03709v2 fatcat:7tk7xyjy6fbv5plxl5jttbqifq

On Instrumental Variable Regression for Deep Offline Policy Evaluation [article]

Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet
2021 arXiv   pre-print
We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding  ...  We find empirically that state-of-the-art OPE methods are closely matched in performance by some IV methods such as AGMM, which were not developed for OPE.  ...  Background Reinforcement learning and offline policy evaluation Reinforcement learning considers a Markov decision process S, A, P, R, µ 0 , γ , where S is the state space, A is the action space, and  ... 
arXiv:2105.10148v1 fatcat:ssz5p76qj5hdhjlwkjsp6fnove

Deep Learning Techniques for Music Generation – A Survey [article]

Jean-Pierre Briot, Gaëtan Hadjeres, François-David Pachet
2019 arXiv   pre-print
This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature.  ...  Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges?  ...  A recent combination of reinforcement learning (more specifically Q-learning) and deep learning, named deep reinforcement learning, has been proposed [122] in order to make learning more efficient.  ... 
arXiv:1709.01620v4 fatcat:hma4znleorfpvh62cpupxu4fq4

Learning Decisions: Robustness, Uncertainty, and Appoximation

J. Andrew Bagnell
Decision making under uncertainty is a central problem in robotics and machine learning. This thesis explores three fundamental and intertwined aspects of the problem of learning to make decisions.  ...  Finally, we provide case studies that serve as both motivation for the techniques as well as illustrate their applicability.  ...  I can hope for the latter, but I am in debt to too many and too deeply for the former.  ... 
doi:10.1184/r1/6555335 fatcat:d66do42kaffhbm3u3s425w7tuy

Local planning for continuous Markov decision processes

Ariel Weinstein
A general formulation of this problem is in terms of reinforcement learning (RL), which has traditionally been restricted to small discrete domains.  ...  By developing planners that function natively in continuous domains, difficult decisions related to how coarsely to discretize the problem are avoided, which allows for more flexible and efficient algorithms  ...  Reinforcement learning in MDPs is concerned with finding a good policy π(s) → a for M.  ... 
doi:10.7282/t3br8q83 fatcat:q276d2krmzhpjgwcniss7t7sx4

Report from Dagstuhl Seminar Artificial and Computational Intelligence in Games: AI-Driven Game Design 1 Executive Summary

Pieter Spronck, Elisabeth André, Michael Cook, Mike Preuß, Pieter Spronck, Elisabeth André, Michael Cook, Mike Preuß, Pieter Spronck, Elisabeth André, Michael Cook, Mike Preuß (+1 others)
Computational Intelligence in   unpublished
To this end, the seminar included a wide range of researchers and developers, including specialists in AI/CI for abstract games, commercial video games, and serious games.  ...  Such techniques include procedural content generation, automated narration, player modelling and adaptation, and automated game design.  ...  From a Tech/programmer for a major games service provider: "How reinforcement learning agents can be applied for testing across multiple games?"  ... 

The Vessel Schedule Recovery Problem (VSRP) – A MIP model for handling disruptions in liner shipping

Berit D. Brouer, Jakob Dirksen, David Pisinger, Christian E.M. Plum, Bo Vaaben
2013 European Journal of Operational Research  
, and the travel paths for users between each pair of origins and destinations.  ...  for the optimization of plants and entire supply chains that are involved in EWO problems.  ...  for the search.  ... 
doi:10.1016/j.ejor.2012.08.016 fatcat:c27kagfnxnhjfbil2rydhjhomm

Approximate Solutions to Markov Decision Processes

Geoffrey J. Gordon
One of the basic problems of machine learning is deciding how to act in an uncertain world.  ...  One representation for a learner's environment and goals is a Markov decision process or MDP.  ...  Thanks in particular to my advisor Tom Mitchell and to Andrew Moore for helping me to see both the forest and the trees, and to Tom Mitchell for nding the funding to let me work on the interesting problems  ... 
doi:10.1184/r1/6551972.v1 fatcat:taciy3ayvnbehml532tdhhyqtu

Dagstuhl Reports, Volume 9, Issue 12, December 2019, Complete Issue

AI for Accessibility in Games Tommy Thompson  ...  A Tour of Reinforcement Learning: The View from Continuous Control.  ...  In contrast, there has been very little progress on this kind of problem in the machine learning and reinforcement learning community.  ... 
doi:10.4230/dagrep.9.12 fatcat:hebigxkvinhjdb6qlg3j5hw25u

Dagstuhl Reports, Volume 7, Issue 11, November 2017, Complete Issue [article]

A General Language for Matching Tile Games Julian Togelius (New York University, US), Cameron Browne (RIKEN -Tokyo, JP), Simon Colton (Falmouth University, GB), Mark J.  ...  From a Tech/programmer for a major games service provider: "How reinforcement learning agents can be applied for testing across multiple games?"  ...  Popular examples include Bejeweled, Tetris, and Candy Crush Saga.  ... 
doi:10.4230/dagrep.7.11 fatcat:pk2gs776vzftffmrue3j2xdgoy