Filters








33 Hits in 2.4 sec

Efficient and Effective Similar Subtrajectory Search with Deep Reinforcement Learning [article]

Zheng Wang, Cheng Long, Gao Cong, Yiding Liu
2020 arXiv   pre-print
Among those approximate algorithms, two that are based on deep reinforcement learning stand out and outperform those non-learning based algorithms in terms of effectiveness and efficiency.  ...  However, the similar subtrajectory search (SimSub) problem, aiming to return a portion of a trajectory (i.e., a subtrajectory) which is the most similar to a query trajectory, has been mostly disregarded  ...  Gao Cong acknowledges the support by Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU), which is a collaboration between Singapore Telecommunications Limited (Singtel) and Nanyang  ... 
arXiv:2003.02542v2 fatcat:wupyxy3odremho5okuvg7ymolq

Policy Guided Monte Carlo: Reinforcement Learning Markov Chain Dynamics [article]

Troels Arnfred Bojesen
2018 arXiv   pre-print
The methodology is generally applicable, unbiased and opens up a new path to automated discovery of efficient MCMC samplers.  ...  We introduce Policy Guided Monte Carlo (PGMC), a computational framework using reinforcement learning to improve Markov chain Monte Carlo (MCMC) sampling.  ...  ACKNOWLEDGMENTS The author would like to thank Shigeki Onoda, Yukitoshi Motome, and Yasuyuki Kato for fruitful discussions and feedback related to this work.  ... 
arXiv:1808.09095v1 fatcat:3cqi27zn5zhspevnmimklg2ffa

An improved high-density sub trajectory clustering algorithm

Xiaoming Liu, Luxi Dong, Chunlin Shang, Xiangda Wei
2020 IEEE Access  
Initially, sub trajectories are divided based on the spatio-temporal characteristic similarity of trajectories.  ...  Finally, a new sub-trajectory clustering algorithm is robust to input parameters based on subtrajectory entropy.  ...  , combined with two-way matching and deep reinforcement learning methods, jointly optimized task scheduling and network resource allocation strategies to maximize the quality of user experience (QoE).  ... 
doi:10.1109/access.2020.2974059 fatcat:uxn2setnvbbkbg4qg36vfpzsbq

Adaptive Tensegrity Locomotion on Rough Terrain via Reinforcement Learning [article]

David Surovik, Kun Wang, Kostas E. Bekris
2018 arXiv   pre-print
Guided Policy Search (GPS), a sample-efficient and model-free hybrid framework for optimization and reinforcement learning, has recently been used to produce periodic locomotion for a spherical 6-bar tensegrity  ...  The dynamical properties of tensegrity robots give them appealing ruggedness and adaptability, but present major challenges with respect to locomotion control.  ...  Reinforcement learning is often also associated with excessive data requirements.  ... 
arXiv:1809.10710v1 fatcat:d2c5wafmrvb25ho6makdmy5x7i

P-KDGAN: Progressive Knowledge Distillation with GANs for One-class Novelty Detection

Zhiwei Zhang, Shifeng Chen, Lei Sun
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Therefore, Progressive Knowledge Distillation with GANs (P-KDGAN) is proposed to learn compact and fast novelty detection networks.  ...  In the first step, the student GAN learns the basic knowledge totally from the teacher via guiding of the pre-trained teacher GAN with fixed weights.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.  ... 
doi:10.24963/ijcai.2020/444 dblp:conf/ijcai/ZhangZJZSSW20 fatcat:cqt2grsflvfy3ak7iube4ufhqi

Constrained-Space Optimization and Reinforcement Learning for Complex Tasks

Ya-Yen Tsai, Bo Xiao, Edward Johns, Guang-Zhong Yang
2020 IEEE Robotics and Automation Letters  
This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks.  ...  The effectiveness of the proposed method is verified with a robotic suturing task, demonstrating that the learned policy outperformed the experts' demonstrations in terms of the smoothness of the joint  ...  By constraining the search area within the variance and applying a teacher's understanding of the subtrajectory to the learning policy, it can avoid the exploration of sub-optimal region and action and  ... 
doi:10.1109/lra.2020.2965392 fatcat:4upr4a6gdreytgdiqbirnjy4fi

CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based RL [article]

Iretiayo Akinola, Zizhao Wang, Peter Allen
2021 arXiv   pre-print
Further more, we introduce novel learning objectives and techniques to improve 3D understanding from multiple image views and sample efficiency of our algorithm.  ...  However, learning a collision-avoidance policy using RL remains a challenge for various reasons, including, but not limited to, partial observability, poor exploration, low sample efficiency, and learning  ...  We show that a single-view is insufficient for 3D obstacle avoidance and demonstrate how to effectively overcome this limitation with more camera-views and in a sample-efficient manner.  ... 
arXiv:2103.13267v1 fatcat:zsacjshkzvanbk64sqsiwpn5lu

Unsupervised Curricula for Visual Meta-Reinforcement Learning [article]

Allan Jabri, Kyle Hsu, Ben Eysenbach, Abhishek Gupta, Sergey Levine, Chelsea Finn
2019 arXiv   pre-print
In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast reinforcement learning (RL) strategies that transfer to similar tasks.  ...  functions and serves as pre-training for more efficient supervised meta-learning of test task distributions.  ...  EX2 : exploration with exemplar models for deep reinforcement learning.  ... 
arXiv:1912.04226v1 fatcat:cer4sn5jm5ezrfmcvsvfs3t6ii

Policy Optimization via Importance Sampling [article]

Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
2018 arXiv   pre-print
Policy optimization is an effective reinforcement learning approach to solve continuous control tasks.  ...  Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with state-of-the-art policy optimization methods.  ...  Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015. [23] Luca Martino, Víctor Elvira, and Francisco Louzada.  ... 
arXiv:1809.06098v1 fatcat:vflzox4du5aylog4k7afw3q444

Composite Q-learning: Multi-scale Q-function Decomposition and Separable Optimization [article]

Gabriel Kalweit, Maria Huegle, Joschka Boedecker
2020 arXiv   pre-print
Deep Q-learning, however, still suffers from poor data-efficiency and is susceptible to stochasticity in the environment or reward functions which is limiting with regard to real-world applications.  ...  We show the efficacy of Composite Q-learning in the tabular case and compare Deep Composite Q-learning with TD3 and TD3(Delta), which we introduce as an off-policy variant of TD(Delta).  ...  A representative of continuous model-free reinforcement learning with function approximation is the Deep Deterministic Policy Gradient (DDPG) actorcritic method (Lillicrap et al., 2016) .  ... 
arXiv:1909.13518v2 fatcat:7g4qschtdjdsbkkejeqkisos2e

When Waiting is not an Option : Learning Options with a Deliberation Cost [article]

Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup
2017 arXiv   pre-print
We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability.  ...  Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance.  ...  In reinforcement learning, options (Sutton et al. 1999b ) provide a framework to represent, learn and plan with temporally extended actions.  ... 
arXiv:1709.04571v1 fatcat:vra55wxyszgunon4mwxbs4ig3e

Compositional Reinforcement Learning from Logical Specifications [article]

Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur
2021 arXiv   pre-print
In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.  ...  Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward.  ...  Acknowledgements and Funding. We thank the anonymous reviewers for their helpful comments.  ... 
arXiv:2106.13906v3 fatcat:32xsfvj7lzdkpk3mhnxg66cvpa

Personalized Route Recommendation with Neural Network Enhanced A* Search Algorithm

Jingyuan Wang, Ning Wu, Xin Zhao
2021 IEEE Transactions on Knowledge and Data Engineering  
Extensive experiment results on three real-world datasets have shown the effectiveness and robustness of the proposed model.  ...  A classic approach is to adapt search algorithms to construct pathfinding-like solutions. These methods typically focus on reducing search space with suitable heuristic strategies.  ...  It is able to effectively utilize context information and characterize complex trajectory characteristics, which elegantly combines the merits of A * search algorithms and deep learning.  ... 
doi:10.1109/tkde.2021.3068479 fatcat:t7pf5eyi2jbyvmezdr6c5h5l34

Continuous Doubly Constrained Batch Reinforcement Learning [article]

Rasool Fakoor and Jonas Mueller and Kavosh Asadi and Pratik Chaudhari and Alexander J. Smola
2021 arXiv   pre-print
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.  ...  Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration.  ...  Guez, and D. Silver. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, page 2094–2100.  ... 
arXiv:2102.09225v4 fatcat:5li6fajsuzdzlf23x6vbzzkftq

Gradient-Aware Model-based Policy Search [article]

Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli
2019 arXiv   pre-print
Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with  ...  Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent.  ...  Acknowledgments This work has been partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR "Algorithms, Games, and Digital Market".  ... 
arXiv:1909.04115v2 fatcat:zat4dxo2zzdwthjv6om3bwpgje
« Previous Showing results 1 — 15 out of 33 results