Filters








9 Hits in 5.3 sec

Best Arm Identification in Graphical Bilinear Bandits [article]

Geovani Rizk and Albert Thomas and Igor Colin and Rida Laraki and Yann Chevaleyre
2021 arXiv   pre-print
We study the best arm identification problem in which the learner wants to find the graph allocation maximizing the sum of the bilinear rewards.  ...  We introduce a new graphical bilinear bandit problem where a learner (or a central entity) allocates arms to the nodes of a graph and observes for each edge a noisy bilinear reward representing the interaction  ...  Hence, solving the best arm identification problem in the described graphical bilinear bandit boils down to solving the same problem in a global linear bandit.  ... 
arXiv:2012.07641v3 fatcat:yocskml56zgk7jig7i7wwmqawm

Pervasive Machine Learning for Smart Radio Environments Enabled by Reconfigurable Intelligent Surfaces [article]

George C. Alexandropoulos and Kyriakos Stylianopoulos and Chongwen Huang and Chau Yuen and Mehdi Bennis and Mérouane Debbah
2022 arXiv   pre-print
bandits approaches, whose resulting sum-rate performances are numerically shown to outperform random configurations, while being sufficiently close to the conventional Deep Q-Network (DQN) algorithm,  ...  Differently from the DRL-based status quo, we leverage the independence between the configuration of the system design parameters and the future states of the wireless environment, and present efficient multi-armed  ...  The multi-armed bandits algorithms are more intuitive and explainable compared to DRL methods.  ... 
arXiv:2205.03793v1 fatcat:m6iv42sbsra7dk744is4rg36yq

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges [article]

Triet H. M. Le, Hao Chen, M. Ali Babar
2020 arXiv   pre-print
Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the  ...  To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation.  ...  Active learning problem can also be considered as a special case of multi-armed bandit [71] . The restriction is that the labeling process totally cannot be abandoned completely.  ... 
arXiv:2002.05442v1 fatcat:bt7dtzrcnjfk5jn6kmin2ruqii

Deep Reinforcement Learning [article]

Yuxi Li
2018 arXiv   pre-print
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details.  ...  We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts.  ...  Lanctot et al. (2017) observe that independent RL, in which each agent learns by interacting with the environment, oblivious to other agents, can overfit the learned policies to other agents' policies  ... 
arXiv:1810.06339v1 fatcat:kp7atz5pdbeqta352e6b3nmuhy

2010 Index IEEE Transactions on Signal Processing Vol. 58

2010 IEEE Transactions on Signal Processing  
Tepedelenlioglu, C., +, TSP Sept. 2010 4783-4794 Multi-agent systems Distributed Learning in Multi-Armed Bandit With Multiple Players.  ...  -S., +, TSP June 2010 3251-3271 Distributed Learning in Multi-Armed Bandit With Multiple Players. Liu, K., +, TSP Nov. 2010 5667-5681 Distributed Sparse Linear Regression.  ...  Global Positioning System A Fixed-Lag Particle Filter for the Joint Detection/Compensation of Interference Effects in GPS Navigation.  ... 
doi:10.1109/tsp.2010.2092533 fatcat:4y66ezuo7zf6doe6nwjqwtc42i

Causality and Generalizability: Identifiability and Learning Methods [article]

Martin Emil Jakobsen
2021 arXiv   pre-print
We present a new structure learning method applicable in additive noise models with directed trees as causal graphs.  ...  Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators.  ...  MEJ and JP were supported by the Carlsberg Foundation; JP was, in addition, supported by a research grant (18968) from VILLUM FONDEN. RDS was supported by EPSRC grant EP/N031938/1.  ... 
arXiv:2110.01430v1 fatcat:c4w4wjt3wbfnhkyfcgflxaskye

Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations

Devin Schwab
2020
Reinforcement Learning (RL) algorithms have been successfully applied to many different robotictasks such as the Ball-in-a-Cup task with a robot arm and various RoboCup robot soccer inspired domains.  ...  In this thesis we focus on designing a representation that makes for easy transfer.  ...  Typically, this behavior policy is the current best policy estimate with some exploration actions added in.  ... 
doi:10.1184/r1/13103309 fatcat:3hfgb2n32fglhfqqzlh7gae7ee

PHY-layer Security in Cognitive Radio Networks through Learning Deep Generative Models: an AI-based approach [article]

ANDREA TOMA
2020
Joint Doctorate in Interactive and Cognitive Environments JD-ICE XXXII cicle Acknowledgements This PhD Thesis has been developed in the framework of, and according to, the rules of the Joint Doctorate  ...  in Interactive and Cognitive Environments JD-ICE with the cooperation of the following Universities:  ...  Other learning techniques could also be considered from the Game theory framework such as Pursuit evasion game and Multi-armed bandit.  ... 
doi:10.15167/toma-andrea_phd2020-04-24 fatcat:acyi72reifcgrojc7frkibxa3m

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields.  ...  In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.  ...  in a graphical structure.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4