A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Value of Interaction and Function Approximation in Imitation Learning
2021
Neural Information Processing Systems
This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation. ...
We study imitation learning under the µ-recoverability assumption of [27] which assumes that the difference in the Q-value under the expert policy across different actions in a state do not deviate beyond ...
Linear function approximation in the no-interaction setting In this section, we go beyond the tabular setting and study IL in the presence of function approximation. ...
dblp:conf/nips/RajaramanHYLJR21
fatcat:o53cccrg4nhc5m5grfog2d6zui
SS-MAIL: Self-Supervised Multi-Agent Imitation Learning
[article]
2021
arXiv
pre-print
The current landscape of multi-agent expert imitation is broadly dominated by two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL). ...
In this work, we address this issue by introducing a novel self-supervised loss that encourages the discriminator to approximate a richer reward function. ...
shifts in the value-function. ...
arXiv:2110.08963v1
fatcat:3ydm4toxmnb73kjis5azijymae
Embodiment Adaptation from Interactive Trajectory Preferences
2018
European Conference on Principles of Data Mining and Knowledge Discovery
for value function approximation [6] and policy learning [5] . ...
) After each interaction, a pairwise preference is assigned between the two trajectories and an reward function approximation r is estimated using the method specified in [1] . ...
dblp:conf/pkdd/WaltonMR18
fatcat:ngpcphyb7rdbvcnchqt5ii4k7a
Dyna-AIL : Adversarial Imitation Learning by Planning
[article]
2019
arXiv
pre-print
interactions in comparison to the state-of-the-art learning methods. ...
Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence. ...
function approximator for D and π, and use a set of expert trajectories to calculate the expectation w.r.t. π E . ...
arXiv:1903.03234v1
fatcat:oddi5xsxqzdabgnv5fbcfzf4r4
Improving imitated grasping motions through interactive expected deviation learning
2010
2010 10th IEEE-RAS International Conference on Humanoid Robots
Our method combines the advantages of reinforcement and imitation learning in a single coherent framework. ...
One of the major obstacles that hinders the application of robots to human day-to-day tasks is the current lack of flexible learning methods to endow the robots with the necessary skills and to allow them ...
The seamless integration of both learning types in our framework is in contrast to existing approaches that non-interactively chain imitation and reinforcement learning. ...
doi:10.1109/ichr.2010.5686846
dblp:conf/humanoids/GraveSB10
fatcat:leaztpyyf5dddd6vb7c4vb2cyu
Affordances, development and imitation
2007
2007 IEEE 6th International Conference on Development and Learning
The key concept is a general model for affordances able to learn the statistical relations between actions, object properties and the effects of actions on objects. ...
To evaluate the approach, we provide results of affordance learning with a real robot and simple imitation games with people. ...
ACKNOWLEDGMENTS This work was (partially) supported by the FCT Programa Operacional Sociedade de Informação (POSI) in the frame of QCA III, and by the EU Projects (IST-004370) RobotCub and (EU-FP6-NEST ...
doi:10.1109/devlrn.2007.4354054
fatcat:ttvc6kufffbyhcmskkmhfhwpgu
Multi-Agent Imitation Learning for Driving Simulation
[article]
2018
arXiv
pre-print
Compared with single-agent GAIL policies, policies generated by our PS-GAIL method prove superior at interacting stably in a multi-agent setting and capturing the emergent behavior of human drivers. ...
Generative Adversarial Imitation Learning (GAIL) has recently been shown to learn representative human driver models. ...
ACKNOWLEDGMENTS Toyota Research Institute (TRI) provided funds to assist the authors with their research, but this article solely reflects the opinions and conclusions of its authors and not TRI or any ...
arXiv:1803.01044v1
fatcat:c7x7bbxcejh7ddtxdfockkojcq
Learning Self-Imitating Diverse Policies
[article]
2019
arXiv
pre-print
The success of popular algorithms for deep reinforcement learning, such as policy-gradients and Q-learning, relies heavily on the availability of an informative reward signal at each timestep of the sequential ...
In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings. ...
The derivation of the approximation and the underlying assumptions are in Appendix 5.1. ...
arXiv:1805.10309v2
fatcat:3wjitdpiffdxjmiibxvanafdqe
Experience, Imitation and Reflection; Confucius' Conjecture and Machine Learning
[article]
2018
arXiv
pre-print
Regarding the learning methods of human, Confucius' point of view is that they are by experience, imitation and reflection. ...
Having that in mind, and considering the several existing machine learning methods this question rises that 'What are some of the best ways for a machine to learn?' ...
In order to tackle these problems a function approximator an be used in order to find the optimal values of each action or state. ...
arXiv:1808.00222v1
fatcat:pcugrxd5cfd53o4a4jjo5val2i
Multi-Agent Interactions Modeling with Correlated Policies
[article]
2020
arXiv
pre-print
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies, ...
Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. ...
The corresponding author Weinan Zhang is supported by NSFC (61702327, 61772333, 61632017) . ...
arXiv:2001.03415v3
fatcat:nu63toybuvhmhkrvbqd2edlikq
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
[article]
2021
arXiv
pre-print
We provide empirical evidence for the effectiveness of ΨΦ-learning as a method for improving RL, IRL, imitation, and few-shot transfer, and derive worst-case bounds for its performance in zero-shot transfer ...
We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. ...
We thank Pablo Samuel Castro, Anna Harutyunyan, RAIL, OATML and IRIS lab members for their helpful feedback. We also thank the anonymous reviewers for useful comments during the review process. ...
arXiv:2102.12560v2
fatcat:iihuwyxiyvduvgc7em5hahxway
Memes in Artificial Life Simulations of Life History Evolution
2010
Workshop on the Synthesis and Simulation of Living Systems
This paper extends the previous study by incorporating imitation and memes to provide a more complete account of learning as a factor in Life History Evolution. ...
The effect that learning has on Life History Evolution has recently been studied using a series of Artificial Life simulations in which populations of competing individuals evolve to learn to perform well ...
In many ways, the relevant trade-offs are clear from a theoretical point of view, but the interactions are complex and highly dependent on the associated parameters. ...
dblp:conf/alife/Bullinaria10
fatcat:hbvua4hlffcyrmmtnydq7kzagi
The Limits of Optimal Pricing in the Dark
[article]
2021
arXiv
pre-print
A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past ...
That is, before the pricing game starts, the buyer simply commits to "imitate" a different value function by pretending to always react optimally according to this imitative value function. ...
In fact, the buyer could even just report his imitative value function to the buyer directly at the beginning of any interaction. ...
arXiv:2110.01707v1
fatcat:wkhe5o7iuvbhdkyeh3kaljtenm
Direct on-line imitation of human faces with hierarchical ART networks
2013
2013 IEEE RO-MAN
The marker-less method solely depends on the interactant's face as an input and does not use a set of basic emotions and is thus capable of displaying a large variety of facial expressions. ...
This work-in-progress paper presents an on-line system for robotic heads capable of mimicking humans. ...
We also greatly acknowledge the support of student assistant Marian Pohling in the technical realization of this work. ...
doi:10.1109/roman.2013.6628502
dblp:conf/ro-man/HolthausW13
fatcat:r2yth6hlura27o3b6prmo7qdki
Behavioral Cloning from Observation
2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several ...
Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. ...
The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research. ...
doi:10.24963/ijcai.2018/687
dblp:conf/ijcai/TorabiWS18
fatcat:ykal6qlt2jgehfsbpryh26zebe
« Previous
Showing results 1 — 15 out of 61,085 results