A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes
2021
The Journal of Artificial Intelligence Research
Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. ...
However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. ...
Introduction Partially observable Markov decision processes (POMDPs) are models for sequential decisionmaking under uncertainty and incomplete information. ...
doi:10.1613/jair.1.12963
fatcat:usbrnbs6dvarrbnj2x4bmmmrwa
Learning Goal-oriented Dialogue Policy with Opposite Agent Awareness
[article]
2020
arXiv
pre-print
We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. ...
decision making. ...
Early methods used probabilistic graph model, such as partially observable Markov decision process (POMDP), to learn dialogue policy by modeling the conditional dependences between observation, belief ...
arXiv:2004.09731v1
fatcat:ifbbbuwftfer3i7zk2se54idee
Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments
[article]
2019
arXiv
pre-print
Navigating urban environments represents a complex task for automated vehicles. They must reach their goal safely and efficiently while considering a multitude of traffic participants. ...
To make the decision strategy robust to perception errors and occlusions, we introduce a belief update technique using a learning based approach. ...
Hence, the autonomous driving problem is inherently a partially observable Markov decision process (POMDP). ...
arXiv:1904.11483v1
fatcat:niiykwcshfbbpeahvfflqmttpy
Spectral Attention-Driven Intelligent Target Signal Identification on a Wideband Spectrum
[article]
2019
arXiv
pre-print
This paper presents a spectral attention-driven reinforcement learning based intelligent method for effective and efficient detection of important signals in a wideband spectrum. ...
observed. ...
Considering the problem as a partially observable Markov decision process (POMDP), the gradient of the total expected reward can be approximated as following, ∇ θ J(θ) ≈ 1 M M i=1 T t=1 ∇ θ logπ θ (a t ...
arXiv:1901.11368v2
fatcat:inn72szbjrhffhowcjk52sjewa
A Collision Relationship-Based Driving Behavior Decision-Making Method for an Intelligent Land Vehicle at a Disorderly Intersection via DRQN
2022
Sensors
This causes decision failure easily. A collision relationship-based driving behavior decision-making method via deep recurrent Q network (CR-DRQN) is proposed for intelligent land vehicles. ...
CR-DRQN maintains a high decision success rate at a disorderly intersection with partially observable states. ...
The partially observable Markov decision process (POMDP) is a suitable model for the environmental states under sensor noise. ...
doi:10.3390/s22020636
pmid:35062596
pmcid:PMC8780178
fatcat:276ga375xbhkjklrtoed7m7yci
Predicting Vehicle Behaviors Over An Extended Horizon Using Behavior Interaction Network
[article]
2019
arXiv
pre-print
We adopt a recurrent neural network (RNN) for observation encoding, and based on that, we propose a novel vehicle behavior interaction network (VBIN) to capture the vehicle interaction from the hidden ...
To avoid unsatisfactory reactive decisions, it is essential to count long-term future rewards in planning, which requires extending the prediction horizon. ...
by solving a partially observable Markov decision process (POMDP) with the following probability transition: p(x t+1 ) = v∈V X v Z v A v p v (x v , (2) where π v t belongs to the discrete set of behaviors ...
arXiv:1903.00848v2
fatcat:ownit44im5ek7euz4y5x3f3lru
Context-Specific Representation Abstraction for Deep Option Learning
[article]
2022
arXiv
pre-print
We test our method against hierarchical, non-hierarchical, and modular recurrent neural network baselines, demonstrating significant sample efficiency improvements in challenging partially observable environments ...
abstraction to effectively reduce the size of the search over policy space. ...
Problem Setting and Notation Partially Observable Markov Decision Process. ...
arXiv:2109.09876v2
fatcat:amncpvpt25bahfs6sd57wftkje
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
[article]
2022
arXiv
pre-print
Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process ...
Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion. ...
Problem Formulation Cooperative MARL problems are often modeled by decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) N , O, A, R, P, γ [19] . ...
arXiv:2205.14953v2
fatcat:dahfhqmejbfcnnxo3yyy6edd5y
Distributive Dynamic Spectrum Access through Deep Reinforcement Learning: A Reservoir Computing Based Approach
2018
IEEE Internet of Things Journal
Using the introduced machine learning-based strategy, SUs could make spectrum access decisions distributedly relying only on their own current and past spectrum sensing outcomes. ...
These two problems become even more challenging for a distributed DSA network where there is no centralized controllers for SUs. ...
from the partial observation of the system by improving performance on a specific task. ...
doi:10.1109/jiot.2018.2872441
fatcat:sovqldihkvfktcfiqvp2wvepjy
Explainable artificial intelligence for autonomous driving: An overview and guide for future research directions
[article]
2022
arXiv
pre-print
First, we provide a thorough overview of the state-of-the-art studies on XAI for autonomous driving. ...
Hence, aside from making safe real-time decisions, the AI systems of autonomous vehicles also need to explain how their decisions are constructed in order to be regulatory compliant across many jurisdictions ...
In such a situation, an agent's interaction with the surrounding is constructed as a partially observable Markov decision process (POMDP). ...
arXiv:2112.11561v2
fatcat:zluqlvmtznh25eihtouubib3ba
A Survey on Reinforcement Learning for Recommender Systems
[article]
2022
arXiv
pre-print
To understand the challenges and relevant solutions, there should be a reference for researchers and practitioners working on RL-based recommender systems. ...
Finally, under discussion for open issues of RL and its limitations of recommender systems, we highlight some potential research directions in this field. ...
In this case, we can formalize the observed interactions as a Partially Observable Markov Decision Process (POMDP). ...
arXiv:2109.10665v2
fatcat:wx5ghn66hzg7faxee54jf7gspq
3D attention-driven depth acquisition for object identification
2016
ACM Transactions on Graphics
This facilitates order-aware view planning accounting for robot movement cost. ...
Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view ...
Acknowledgements We thank the anonymous reviewers for their valuable comments and suggestions. ...
doi:10.1145/2980179.2980224
fatcat:wrvfjdz54bdnfmh5mui3itjnt4
A Survey on Session-based Recommender Systems
[article]
2021
arXiv
pre-print
Recommender systems (RSs) have been playing an increasingly important role for informed consumption, services, and decision-making in the overloaded information era and digitized economy. ...
Different from other RSs such as content-based RSs and collaborative filtering-based RSs which usually model long-term yet static user preferences, SBRSs aim to capture short-term but dynamic user preferences ...
Yan Zhao for their constructive suggestions on this work. This work was supported by Australian Research Council Discovery Grants (DP180102378, DP190101079 and FT190100734). ...
arXiv:1902.04864v3
fatcat:oka5bvibzzbk5oreltrupehaey
DAEMON Deliverable 4.1: Initial design of intelligent orchestration and management mechanisms
2021
Zenodo
WP4's main objective is to design NI-based solutions for the functionality related to service and resource orchestration and management, particularly covering both DAEMON objective 2 (Developing specialized ...
To this aim, NI-based solutions are investigated on top of the DAEMON framework, providing NI-assisted functionality for the management and orchestration in the B5G system to attain a variety of goals: ...
Therefore, the problem at hand is a Partially Observable Markov Decision Process (POMDP), so that, rather than the state space, we need to describe the observation space. ...
doi:10.5281/zenodo.5745456
fatcat:isg5vbmabnecblzd7d3536cjya
Survey on the Application of Deep Reinforcement Learning in Image Processing
2020
Journal on Artificial Intelligence
amount of data, and use reinforcement learning to learn the best strategy to complete the task. ...
In this paper we have summarized the main techniques of deep reinforcement learning and its applications in image processing. ...
[Rao, Lu and Zhou (2017) ] proposed an attention-aware deep reinforcement learning (ADRL) method for video face recognition, which is based on Markov decision process to eliminate misleading and confusing ...
doi:10.32604/jai.2020.09789
fatcat:qtmxvu7bqvcm7mxplsyw4n63km
« Previous
Showing results 1 — 15 out of 448 results