Filters








448 Hits in 7.4 sec

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Steven Carr, Nils Jansen, Ufuk Topcu
2021 The Journal of Artificial Intelligence Research  
Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information.  ...  However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications.  ...  Introduction Partially observable Markov decision processes (POMDPs) are models for sequential decisionmaking under uncertainty and incomplete information.  ... 
doi:10.1613/jair.1.12963 fatcat:usbrnbs6dvarrbnj2x4bmmmrwa

Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness [article]

Zheng Zhang, Lizi Liao, Xiaoyan Zhu, Tat-Seng Chua, Zitao Liu, Yan Huang, Minlie Huang
2020 arXiv   pre-print
We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues.  ...  decision making.  ...  Early methods used probabilistic graph model, such as partially observable Markov decision process (POMDP), to learn dialogue policy by modeling the conditional dependences between observation, belief  ... 
arXiv:2004.09731v1 fatcat:ifbbbuwftfer3i7zk2se54idee

Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments [article]

Maxime Bouton, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer
2019 arXiv   pre-print
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants.  ...  To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach.  ...  Hence, the autonomous driving problem is inherently a partially observable Markov decision process (POMDP).  ... 
arXiv:1904.11483v1 fatcat:niiykwcshfbbpeahvfflqmttpy

Spectral Attention-Driven Intelligent Target Signal Identification on a Wideband Spectrum [article]

Gihan Mendis, Jin Wei, Arjuna Madanayakey, Soumyajit Mandalz
2019 arXiv   pre-print
This paper presents a spectral attention-driven reinforcement learning based intelligent method for effective and efficient detection of important signals in a wideband spectrum.  ...  observed.  ...  Considering the problem as a partially observable Markov decision process (POMDP), the gradient of the total expected reward can be approximated as following, ∇ θ J(θ) ≈ 1 M M i=1 T t=1 ∇ θ logπ θ (a t  ... 
arXiv:1901.11368v2 fatcat:inn72szbjrhffhowcjk52sjewa

A Collision Relationship-Based Driving Behavior Decision-Making Method for an Intelligent Land Vehicle at a Disorderly Intersection via DRQN

Lingli Yu, Shuxin Huo, Keyi Li, Yadong Wei
2022 Sensors  
This causes decision failure easily. A collision relationship-based driving behavior decision-making method via deep recurrent Q network (CR-DRQN) is proposed for intelligent land vehicles.  ...  CR-DRQN maintains a high decision success rate at a disorderly intersection with partially observable states.  ...  The partially observable Markov decision process (POMDP) is a suitable model for the environmental states under sensor noise.  ... 
doi:10.3390/s22020636 pmid:35062596 pmcid:PMC8780178 fatcat:276ga375xbhkjklrtoed7m7yci

Predicting Vehicle Behaviors Over An Extended Horizon Using Behavior Interaction Network [article]

Wenchao Ding and Jing Chen and Shaojie Shen
2019 arXiv   pre-print
We adopt a recurrent neural network (RNN) for observation encoding, and based on that, we propose a novel vehicle behavior interaction network (VBIN) to capture the vehicle interaction from the hidden  ...  To avoid unsatisfactory reactive decisions, it is essential to count long-term future rewards in planning, which requires extending the prediction horizon.  ...  by solving a partially observable Markov decision process (POMDP) with the following probability transition: p(x t+1 ) = v∈V X v Z v A v p v (x v , (2) where π v t belongs to the discrete set of behaviors  ... 
arXiv:1903.00848v2 fatcat:ownit44im5ek7euz4y5x3f3lru

Context-Specific Representation Abstraction for Deep Option Learning [article]

Marwa Abdulhai, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro, Jonathan P. How
2022 arXiv   pre-print
We test our method against hierarchical, non-hierarchical, and modular recurrent neural network baselines, demonstrating significant sample efficiency improvements in challenging partially observable environments  ...  abstraction to effectively reduce the size of the search over policy space.  ...  Problem Setting and Notation Partially Observable Markov Decision Process.  ... 
arXiv:2109.09876v2 fatcat:amncpvpt25bahfs6sd57wftkje

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem [article]

Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang
2022 arXiv   pre-print
Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process  ...  Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion.  ...  Problem Formulation Cooperative MARL problems are often modeled by decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) N , O, A, R, P, γ [19] .  ... 
arXiv:2205.14953v2 fatcat:dahfhqmejbfcnnxo3yyy6edd5y

Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach

Hao-Hsuan Chang, Hao Song, Yang Yi, Jianzhong Zhang, Haibo He, Lingjia Liu
2018 IEEE Internet of Things Journal  
Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes.  ...  These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs.  ...  from the partial observation of the system by improving performance on a specific task.  ... 
doi:10.1109/jiot.2018.2872441 fatcat:sovqldihkvfktcfiqvp2wvepjy

Explainable artificial intelligence for autonomous driving: An overview and guide for future research directions [article]

Shahin Atakishiyev, Mohammad Salameh, Hengshuai Yao, Randy Goebel
2022 arXiv   pre-print
First, we provide a thorough overview of the state-of-the-art studies on XAI for autonomous driving.  ...  Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how their decisions are constructed in order to be regulatory compliant across many jurisdictions  ...  In such a situation, an agent's interaction with the surrounding is constructed as a partially observable Markov decision process (POMDP).  ... 
arXiv:2112.11561v2 fatcat:zluqlvmtznh25eihtouubib3ba

A Survey on Reinforcement Learning for Recommender Systems [article]

Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, Chunyan Miao
2022 arXiv   pre-print
To understand the challenges and relevant solutions, there should be a reference for researchers and practitioners working on RL-based recommender systems.  ...  Finally, under discussion for open issues of RL and its limitations of recommender systems, we highlight some potential research directions in this field.  ...  In this case, we can formalize the observed interactions as a Partially Observable Markov Decision Process (POMDP).  ... 
arXiv:2109.10665v2 fatcat:wx5ghn66hzg7faxee54jf7gspq

3D attention-driven depth acquisition for object identification

Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or, Baoquan Chen
2016 ACM Transactions on Graphics  
This facilitates order-aware view planning accounting for robot movement cost.  ...  Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view  ...  Acknowledgements We thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1145/2980179.2980224 fatcat:wrvfjdz54bdnfmh5mui3itjnt4

A Survey on Session-based Recommender Systems [article]

Shoujin Wang, Longbing Cao, Yan Wang, Quan Z. Sheng, Mehmet Orgun, Defu Lian
2021 arXiv   pre-print
Recommender systems (RSs) have been playing an increasingly important role for informed consumption, services, and decision-making in the overloaded information era and digitized economy.  ...  Different from other RSs such as content-based RSs and collaborative filtering-based RSs which usually model long-term yet static user preferences, SBRSs aim to capture short-term but dynamic user preferences  ...  Yan Zhao for their constructive suggestions on this work. This work was supported by Australian Research Council Discovery Grants (DP180102378, DP190101079 and FT190100734).  ... 
arXiv:1902.04864v3 fatcat:oka5bvibzzbk5oreltrupehaey

DAEMON Deliverable 4.1: Initial design of intelligent orchestration and management mechanisms

Georgios Iosifidis, Danny De Vleeschauwer, Chia-Yu Chang, Marco Fiore, Sergi Alcalá, Andres Garcia-Saavedra, Gines Garcia, Ivan Paez, Gabriele Baldoni, Andra Lutu, Miguel Camelo, Nina Slamnik-Krijestorac (+10 others)
2021 Zenodo  
WP4's main objective is to design NI-based solutions for the functionality related to service and resource orchestration and management, particularly covering both DAEMON objective 2 (Developing specialized  ...  To this aim, NI-based solutions are investigated on top of the DAEMON framework, providing NI-assisted functionality for the management and orchestration in the B5G system to attain a variety of goals:  ...  Therefore, the problem at hand is a Partially Observable Markov Decision Process (POMDP), so that, rather than the state space, we need to describe the observation space.  ... 
doi:10.5281/zenodo.5745456 fatcat:isg5vbmabnecblzd7d3536cjya

Survey on the Application of Deep Reinforcement Learning in Image Processing

Wei Fang, Lin Pang, Weinan Yi
2020 Journal on Artificial Intelligence  
amount of data, and use reinforcement learning to learn the best strategy to complete the task.  ...  In this paper we have summarized the main techniques of deep reinforcement learning and its applications in image processing.  ...  [Rao, Lu and Zhou (2017) ] proposed an attention-aware deep reinforcement learning (ADRL) method for video face recognition, which is based on Markov decision process to eliminate misleading and confusing  ... 
doi:10.32604/jai.2020.09789 fatcat:qtmxvu7bqvcm7mxplsyw4n63km
« Previous Showing results 1 — 15 out of 448 results