690 Hits in 7.7 sec

Solving reward-collecting problems with UAVs: a comparison of online optimization and Q-learning [article]

Yixuan Liu and Chrysafis Vogiatzis and Ruriko Yoshida and Erich Morman
2021 arXiv   pre-print
We present a comparison of three methods to solve this problem: namely we implement a Deep Q-Learning model, an ε-greedy tabular Q-Learning model, and an online optimization framework.  ...  In this work, we specifically study the problem of identifying a short path from a designated start to a goal, while collecting all rewards and avoiding adversaries that move randomly on the grid.  ...  Yixuan Liu: Implemented the Deep Q-Learning and online optimization framework and conducted all computational experiments. 2.  ... 
arXiv:2112.00141v1 fatcat:motshvd4qrfvphe2hnoppyib4q

Deep Reinforcement Learning for UAV Intelligent Mission Planning

Longfei Yue, Rennong Yang, Ying Zhang, Lixin Yu, Zhuangzhuang Wang, Wen-Long Shang
2022 Complexity  
In this paper, an end-to-end UAV intelligent mission planning method based on deep reinforcement learning (DRL) is proposed to solve the shortcomings of the traditional intelligent optimization algorithm  ...  Specifically, the suppression of enemy air defense (SEAD) mission planning is described as a sequential decision-making problem and formalized as a Markov decision process (MDP).  ...  Acknowledgments e work described in this paper is partially supported by the Nature Science Foundation of Shaanxi Provincem of China under Grant no. 2021JQ-370 and the National Natural Science Foundation  ... 
doi:10.1155/2022/3551508 fatcat:3adqhbzq4rgi5m2gpiwpywgmqa

UAV Trajectory, User Association and Power Control for Multi-UAV Enabled Energy Harvesting Communications: Offline Design and Online Reinforcement Learning [article]

Chien-Wei Fu, Meng-Lin Ku, Yu-Jia Chen, Tony Q. S. Quek
2022 arXiv   pre-print
The problem is solved by three convex subproblems with successive convex approximation (SCA) and alternative optimization.  ...  An idea of multi-UAV regulated flight corridors, based on the optimal offline UAV trajectories, is proposed to avoid unnecessary flight exploration by UAVs and enables us to improve the learning efficiency  ...  In [2] , a maximum-minimum data collection rate problem is solved for UAV wireless sensor networks in urban areas, where the three-dimensional UAV trajectory and transmission scheduling of sensors are  ... 
arXiv:2207.10371v1 fatcat:naojpbb5zbei7f4tyhang5hxei

Jamming-Resilient Path Planning for Multiple UAVs via Deep Reinforcement Learning [article]

Xueyuan Wang, M. Cenk Gursoy, Tugba Erpek, Yalin E. Sagduyu
2021 arXiv   pre-print
We, then, propose an offline temporal difference (TD) learning algorithm with online signal-to-interference-plus-noise ratio (SINR) mapping to solve the problem.  ...  More specifically, a value network is constructed and trained offline by TD method to encode the interactions among the UAVs and between the UAVs and the environment; and an online SINR mapping deep neural  ...  Then, we proposed an offline TD learning algorithm for the RL agent with online SINR mapping to solve the problem.  ... 
arXiv:2104.04477v2 fatcat:mz2eldy7xzcj3o5grlng5eytqy

UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach [article]

Harald Bayerlein, Mirco Theile, Marco Caccamo, David Gesbert
2020 arXiv   pre-print
a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints.  ...  , or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters.  ...  We translate this optimization problem into a reward function as part of a Markov decision process, which we solve using deep reinforcement learning. B.  ... 
arXiv:2007.00544v2 fatcat:yzgtgf27afdpbgq3ocyopyusoe

Trajectory Design and Power Control for Multi-UAV Assisted Wireless Networks: A Machine Learning Approach [article]

Xiao Liu, Yuanwei Liu, Yue Chen, Lajos Hanzo
2019 arXiv   pre-print
In this algorithm, multiple UAVs act as agents to find optimal actions by interacting with their environment and learn from their mistakes.  ...  Firstly, a multi-agent Q-learning based placement algorithm is proposed for determining the optimal positions of the UAVs based on the initial location of the users.  ...  In order to solve this problem at a low complexity, a multi-agent Q-learning algorithm will be invoked in Section IV for finding the optimal solution with a high probability, despite searching through  ... 
arXiv:1812.07665v2 fatcat:nuxmvilqrjdbfkeve2amonl3oq

DQN-based Beamforming for Uplink mmWave Cellular-Connected UAVs [article]

Susarla Praneeth, Gouda Bikshapathi, Deng Yansha, Juntti Markku, Silven Olli, Tolli Antti
2021 arXiv   pre-print
In this paper, we propose a reinforcement learning (RL)-based framework for UAV-BS beam alignment using deep Q-Network (DQN) in a mmWave setting.  ...  The framework can also learn optimal beam alignment comparable to the exhaustive approach in an online manner under real-time conditions.  ...  The approach learns an optimal approximated policy of states mapping to actions π(s) = a by parameterizing and estimating state-action value function Q(s, a; θ) using deep neural networks (DNN).  ... 
arXiv:2110.06318v1 fatcat:sj2hkkf6ivd2rcyqi72tb6mbau

Adaptive UAV-Trajectory Optimization Under Quality of Service Constraints: A Model-Free Solution

Jingjing Cui, Zhiguo Ding, Yansha Deng, Arumugam Nallanathan, Lajos Hanzo
2020 IEEE Access  
reinforcement learning problem by modelling the motion-trajectory as a Markov decision process with the UAV acting as the learning agent.  ...  More specifically, by dividing the considered region into small tiles, we conceive state-action-reward-state-action (Sarsa) and Q-learning based UAV-trajectory optimization algorithms (i.e., SUTOA and  ...  In terms of the network having very large populations of UAVs, an online trajectory optimization approach based on federated learning and mean-field games was proposed in [26] .  ... 
doi:10.1109/access.2020.3001752 fatcat:l4asl26r6fgctgfwvxnptg32eu

Machine-Learning Beam Tracking and Weight Optimization for mmWave Multi-UAV Links [article]

Hsiao-Lan Chiang and Kwang-Cheng Chen and Wolfgang Rave and Mostafa Khalili Marandi and Gerhard Fettweis
2019 arXiv   pre-print
An efficient method to deal with high dynamics of UAVs applies machine learning, particularly Q-learning, to analog beam tracking.  ...  The proposed Q-learning-based beam tracking scheme uses current/past observations to design rewards from environments to facilitate prediction, which significantly increases the efficiency of data transmission  ...  Q-learning is a model-free reinforcement learning algorithm that uses experience, current measurements, and rewards from the environments to solve the prediction problem without knowing a model of the  ... 
arXiv:1910.13538v1 fatcat:qy5artg4hfc5dcsprdt2dikrha

3D UAV Trajectory and Data Collection Optimisation via Deep Reinforcement Learning

Khoi Khac Nguyen, Trung Q. Duong, Tan Do-Duy, Holger Claussen, Lajos Hanzo
2022 IEEE Transactions on Communications  
Then, a deep reinforcement learning-based technique is conceived for finding the optimal trajectory and throughput in a specific coverage area.  ...  More explicitly, we characterise the attainable performance in terms of the UAV trajectory, the expected reward and the total sum-rate.  ...  Deep Q-learning (DQL) is employed for finding the best trajectory and for solving our data collection problem in Section IV.  ... 
doi:10.1109/tcomm.2022.3148364 fatcat:iymg3fxt2vh7rjmrycdjaanqxe

Artificial Intelligence Aided Next-Generation Networks Relying on UAVs [article]

Xiao Liu, Mingzhe Chen, Yuanwei Liu, Yue Chen, Shuguang Cui, Lajos Hanzo
2020 arXiv   pre-print
Moreover, AI enables the interaction amongst a swarm of UAVs for cooperative optimization of the system.  ...  In the AI-enabled UAV-aided wireless networks (UAWN), multiple UAVs are employed as aerial base stations, which are capable of rapidly adapting to the dynamic environment by collecting information about  ...  RL and DL algorithms come to rescue, as a benefit of their learning capability for solving the associated dynamic trajectory design problems.  ... 
arXiv:2001.11958v1 fatcat:i35weka7wndghp3folyzsd4mi4

Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment [article]

Shuangyao Huang, Haibo Zhang, Zhiyi Huang
2022 arXiv   pre-print
To solve the credit assignment problem in CTDE, we design a counterfactual baseline that marginalizes both an agent's state and action, enabling to evaluate the importance of an agent in the joint observation-action  ...  Centralized Training with Decentralized Execution (CTDE) in Multi-Agent Reinforcement Learning is a promising method for multi-UAV collision avoidance, in which the key challenge is to effectively learn  ...  Independent reinforcement learning has been proposed to better solve the problem of collision avoidance for UAV swarms.  ... 
arXiv:2204.08594v1 fatcat:qfvminpgc5alpbtfx3wktvim6y

Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach [article]

Sarder Fakhrul Abedin, Md. Shirajum Munir, Nguyen H. Tran, Zhu Han, Choong Seon Hong
2020 arXiv   pre-print
Second, we propose an agile deep reinforcement learning with experience replay model to solve the formulated problem concerning the contextual constraints for the UAV-BS navigation.  ...  Moreover, the proposed approach is well-suited for solving the problem, since the state space of the problem is extremely large and finding the best trajectory policy with useful contextual features is  ...  Section V explains in detail how we solve the proposed optimization problem with deep Q-learning with experience replay.  ... 
arXiv:2003.04816v1 fatcat:yrywbbc6vbgcjougpfwpus2gxu

UAV Target Tracking in Urban Environments Using Deep Reinforcement Learning [article]

Sarthak Bhagat, Sujit PB
2020 arXiv   pre-print
In this paper, we introduce Target Following DQN (TF-DQN), a deep reinforcement learning technique based on Deep Q-Networks with a curriculum training framework for the UAV to persistently track the target  ...  Persistent target tracking in urban environments using UAV is a difficult task due to the limited field of view, visibility obstruction from obstacles and uncertain target motion.  ...  After every τ iterations, the weights of the target network are equated with that of the online network. • Experience Replays: During training, experiences e t = (s t , a t , r t , s t+1 ) are collected  ... 
arXiv:2007.10934v1 fatcat:b5auyavc7zdpth42o73q3l62hq

Multi-Agent Reinforcement Learning Based Resource Allocation for UAV Networks

Jingjing Cui, Yuanwei Liu, Arumugam Nallanathan
2019 IEEE Transactions on Wireless Communications  
To model the dynamics and uncertainty in environments, we formulate the long-term resource allocation problem as a stochastic game for maximizing the expected rewards, where each UAV becomes a learning  ...  This article investigates dynamic resource allocation of multiple UAVs enabled communication networks with the goal of maximizing long-term rewards.  ...  solving a MDP is to find an optimal strategy to obtain a maximal reward.  ... 
doi:10.1109/twc.2019.2935201 fatcat:v222pukxkjafljellj4odyawke
« Previous Showing results 1 — 15 out of 690 results