8,692 Hits in 4.6 sec

Cross-Domain Imitation Learning via Optimal Transport [article]

Arnaud Fickinger, Samuel Cohen, Stuart Russell, Brandon Amos
2022 arXiv   pre-print
Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology.  ...  We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents  ...  CROSS-DOMAIN IMITATION LEARNING VIA OPTIMAL TRANSPORT COMPARING POLICIES FROM ARBITRARILY DIFFERENT MDPS For a stationary policy π acting on a metric MDP (S, A, R, P, γ, d), the occupancy measure is:  ... 
arXiv:2110.03684v3 fatcat:4v6rhqijbrha3iay2ahb3bou4u

Hitting time for Markov decision process [article]

Ruichao Jiang, Javad Tavakoli, Yiqinag Zhao
2022 arXiv   pre-print
Conclusion We defined the hitting time for an MDP via a relationship between the MDP and the PageRank. The hitting time is a quasi-distance.  ... 
arXiv:2205.03476v2 fatcat:znhwmdbk6fdznpxg6cwcbyillu

Constraint-Aware Deep Reinforcement Learning for End-to-End Resource Orchestration in Mobile Networks [article]

Qiang Liu and Nakjung Choi and Tao Han
2021 arXiv   pre-print
via a policy network.  ...  To solve this problem, we propose SafeSlicing that introduces a new constraint-aware deep reinforcement learning (CaDRL) algorithm to learn the optimal resource orchestration policy within two steps, i.e  ...  Experimental Results 1) Optimizing Network Slicing via learning: Fig. 7 (a) shows the entire learning-based optimization process of SafeSlicing.  ... 
arXiv:2110.04320v1 fatcat:rv4ueaij5jed5nn6wzzuvqqyxa

A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous Vehicles [article]

Fei Ye, Shen Zhang, Pin Wang, Ching-Yao Chan
2021 arXiv   pre-print
However, this approach does not automatically guarantee maximal performance due to the lack of a system-level optimization.  ...  In this survey, we systematically summarize the current literature on studies that apply reinforcement learning (RL) to the motion planning and control of autonomous vehicles.  ...  On the other hand, some of the studies choose to learn from human demonstrations via imitation learning and introduce perturbation to discourage undesirable behavior [14] .  ... 
arXiv:2105.14218v2 fatcat:27glt4i4lfhg3j4ozjrlsq6i3e

Learning Calibratable Policies using Programmatic Style-Consistency [article]

Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht
2020 arXiv   pre-print
We leverage programmatic labeling functions to specify controllable styles, and derive a formal notion of style-consistency as a learning objective, which can then be solved using conventional policy learning  ...  professional basketball players and agents in the MuJoCo physics environment, and show that existing approaches that do not explicitly enforce style-consistency fail to generate diverse behaviors whereas our learned  ...  We demonstrate style-calibrated policy learning in Basketball and MuJoCo domains.  ... 
arXiv:1910.01179v3 fatcat:g5xvihjb4vbanaltemwarq7mqy

Learning Algorithms for Regenerative Stopping Problems with Applications to Shipping Consolidation in Logistics [article]

Kishor Jothimurugan, Matthew Andrews, Jeongran Lee, Lorenzo Maggi
2021 arXiv   pre-print
In this paper, we compare such solutions to deep reinforcement learning and imitation learning which involve learning a neural network policy from simulations.  ...  We evaluate the different approaches on a real-world problem of shipping consolidation in logistics and demonstrate that deep learning can be effectively used to solve such problems.  ...  We then apply the Imitation Learning algorithm from Ross et al. (2011) to imitate this hindsight optimal solution in real-time.  ... 
arXiv:2105.02318v1 fatcat:5lrufbecw5albiizjphkjxyvpm

Imitation Learning for Vision-based Lane Keeping Assistance [article]

Christopher Innocenti, Henrik Lindén, Ghazaleh Panahandeh, Lennart Svensson, Nasser Mohammadiha
2017 arXiv   pre-print
The policy is successfully learned via imitation learning using real-world data collected from human drivers and is evaluated in closed-loop simulated environments, demonstrating good driving behaviour  ...  and a robustness for domain changes.  ...  Mohammadiha, "Imitation Learning for Vision-based Lane Keeping Assistance", in Proc. of the International Conference on Intelligent Transportation Systems (ITSC), 2017.  ... 
arXiv:1709.03853v1 fatcat:bbmn3jdpnne4jory4aqaqbt64m

Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation [article]

Matthew O'Kelly, Aman Sinha, Hongseok Namkoong, John Duchi, Russ Tedrake
2019 arXiv   pre-print
We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms.  ...  Using the highway traffic dataset NGSim [36] , we train policies of human drivers via imitation learning [45, 41, 42, 22, 6] .  ...  In our risk-based framework, we replace the complex system specifications required for formal verification methods with a model P 0 that we learn via imitation-learning techniques.  ... 
arXiv:1811.00145v3 fatcat:gjepbhwn2zdvne4nxsvjut5nhq

LORM: Learning to Optimize for Resource Management in Wireless Networks with Few Training Samples [article]

Yifei Shen, Yuanming Shi, Jun Zhang, Khaled B. Letaief
2019 arXiv   pre-print
Instead of the end-to-end learning approach adopted in previous studies, LORM learns the optimal pruning policy in the branch-and-bound algorithm for MINLPs via a sample-efficient method, namely, imitation  ...  To further address the task mismatch problem, we develop a transfer learning method via self-imitation in LORM, named LORM-TL, which can quickly adapt a pre-trained machine learning model to the new task  ...  LORM: LEARNING TO OPTIMIZE FOR RESOURCE MANAGEMENT In this section, we first introduce the idea of learning the policies in the branch-and-bound algorithm via imitation learning.  ... 
arXiv:1812.07998v2 fatcat:54v7y5f7lrablkgazuhejneuym

Cross-Subject Transfer Learning in Human Activity Recognition Systems using Generative Adversarial Networks [article]

Elnaz Soleimani, Ehsan Nazerfard
2019 arXiv   pre-print
Transfer learning techniques aim to transfer the knowledge which has been learned from the source domain (subject) to the target domain in order to decrease the models' performance loss in the target domain  ...  learning in the domain of wearable sensor-based Human Activity Recognition.  ...  This model which is working in a semi-supervised manner obtains labels of target domain via the second annotation.  ... 
arXiv:1903.12489v1 fatcat:i3flguywqzdmxa6kzre36pxq7u

Transfer Learning in Deep Reinforcement Learning: A Survey [article]

Zhuangdi Zhu, Kaixiang Lin, Anil K. Jain, Jiayu Zhou
2022 arXiv   pre-print
Along with the promising prospects of reinforcement learning in numerous domains such as robotics and game-playing, transfer learning has arisen to tackle various challenges faced by reinforcement learning  ...  Reinforcement learning is a learning paradigm for solving sequential decision-making problems.  ...  There are currently two main paradigms for imitation learning.  ... 
arXiv:2009.07888v5 fatcat:2rfeugb27ffv7jxh7siqn56s6e

Technical Sessions

2021 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)  
Network Interoperability Assessment Target 5G visible light positioning signal subcarrier extraction method using particle swarm optimization algorithm A Machine Learning Solution for Automatic Selection  ...  imitation-to-innovation training scheme Yi Feng, University of Ottawa 5G Multicast Broadcast Services Performance Evaluation Álvaro Ibanez Latorre, Universidad Politécnica de Valencia Few Pains, Many  ... 
doi:10.1109/bmsb53066.2021.9547160 fatcat:3npwqozpznfa7npul4jitqgbq4

TANGO: Commonsense Generalization in Predicting Tool Interactions for Mobile Manipulators [article]

Shreshth Tuli, Rajas Bansal, Rohan Paul, Mausam
2021 arXiv   pre-print
Robots assisting us in factories or homes must learn to make use of objects as tools to perform tasks, e.g., a tray for carrying objects.  ...  The model learns to attend over the scene using knowledge of the goal and the action history, finally decoding the symbolic action to execute.  ...  We take an imitation learning approach and aim at learning this function from demonstrations by human teachers.  ... 
arXiv:2105.04556v2 fatcat:2z5x475aebd7tklvms76oh5ot4

When Autonomous Systems Meet Accuracy and Transferability through AI: A Survey [article]

Chongzhen Zhang, Jianrui Wang, Gary G. Yen, Chaoqiang Zhao, Qiyu Sun, Yang Tang, Feng Qian, Jürgen Kurths
2020 arXiv   pre-print
Transferability means that when a well-trained model is transferred to other testing domains, the accuracy is still good.  ...  Firstly, we introduce some basic concepts of transfer learning and then present some preliminaries of adversarial learning, RL and meta-learning.  ...  Depth estimation via joint tasks learning.  ... 
arXiv:2003.12948v3 fatcat:qtmjs74p2vh6thdotbhgebdvoi

Deep Reinforcement Learning: An Overview [article]

Yuxi Li
2018 arXiv   pre-print
We start with background of machine learning, deep learning and reinforcement learning.  ...  processing, including dialogue systems, machine translation, and text generation, computer vision, neural architecture design, business management, finance, healthcare, Industry 4.0, smart grid, intelligent transportation  ...  We discuss imitation learning with GANs in Section 3.3, including generative adversarial imitation learning, and third person imitation learning.  ... 
arXiv:1701.07274v6 fatcat:x2es3yf3crhqblbbskhxelxf2q
« Previous Showing results 1 — 15 out of 8,692 results