35 Hits in 1.4 sec

Lenient Multi-Agent Deep Reinforcement Learning [article]

Gregory Palmer, Karl Tuyls, Daan Bloembergen, Rahul Savani
2018 arXiv   pre-print
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL.
more » ... ent agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
arXiv:1707.04402v2 fatcat:ozvqkfba7jf7vhpj7jg6imtsci

Robust Temporal Difference Learning for Critical Domains [article]

Richard Klima, Daan Bloembergen, Michael Kaisers, Karl Tuyls
2019 arXiv   pre-print
We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the κ-operator, allows to learn a robust policy in a model-based fashion without actually observing the SRE. We introduce single- and multi-agent robust TD methods using the operator κ. We prove convergence of the operator to the optimal robust Q-function with respect to the model using the
more » ... ory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of κ-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context.
arXiv:1901.08021v2 fatcat:nfmapvjklfgg3g72euvqprc4ti

Multi-agent Learning and the Reinforcement Gradient [chapter]

Michael Kaisers, Daan Bloembergen, Karl Tuyls
2012 Lecture Notes in Computer Science  
Acknowledgements We thank Daan Bloembergen for the fruitful discussions and useful insights that helped mature the analysis to its current state.  ... 
doi:10.1007/978-3-642-34799-3_10 fatcat:j447aprslnafphio2bvyr3f3ya

Space Debris Removal: A Game Theoretic Analysis

Richard Klima, Daan Bloembergen, Rahul Savani, Karl Tuyls, Daniel Hennes, Dario Izzo
2016 Games  
doi:10.3390/g7030020 fatcat:axuvhszftbdjbd3fnowqb2tg4e

Automatic labelling of urban point clouds using data fusion [article]

Daan Bloembergen, Chris Eijgenstein
2021 arXiv   pre-print
In this paper we describe an approach to semi-automatically create a labelled dataset for semantic segmentation of urban street-level point clouds. We use data fusion techniques using public data sources such as elevation data and large-scale topographical maps to automatically label parts of the point cloud, after which only limited human effort is needed to check the results and make amendments where needed. This drastically limits the time needed to create a labelled dataset that is
more » ... enough to train deep semantic segmentation models. We apply our method to point clouds of the Amsterdam region, and successfully train a RandLA-Net semantic segmentation model on the labelled dataset. These results demonstrate the potential of smart data fusion and semantic segmentation for the future of smart city planning and management.
arXiv:2108.13757v2 fatcat:chalovumgzf5bc2xn6wtw7dtsu

Environmental effects on simulated emotional and moody agents

Joe Collenette, Katie Atkinson, Daan Bloembergen, Karl Tuyls
2017 Knowledge engineering review (Print)  
., 2011; Ranjbar-Sahraei, Bou Ammar, Bloembergen, Tuyls and Weiss, 2014) .  ...  prisoner's dilemma has been an active area of research in the past decades, with a particular focus on the evolution of cooperation within groups of agents (Axelrod and Hamilton, 1981; Santos et al., 2008; Bloembergen  ... 
doi:10.1017/s0269888917000170 fatcat:npsqpqkpwvfy3pwh4lc5hkuj3m

Trading in markets with noisy information: an evolutionary analysis

Daan Bloembergen, Daniel Hennes, Peter McBurney, Karl Tuyls
2015 Connection science  
We analyse the value of information in a stock market where information can be noisy and costly, using techniques from empirical game theory. Previous work has shown that the value of information follows a J-curve, where averagely informed traders perform below market average, and only insiders prevail. Here we show that both noise and cost can change this picture, in several cases leading to opposite results where insiders perform below market average, and averagely informed traders prevail.
more » ... ese results provide insight into the complexity of real marketplaces, and show under which conditions a broad mix of different trading strategies might be sustainable.
doi:10.1080/09540091.2015.1039492 fatcat:l44rc6ujavdgjplcrkxhn42v5u

Learning in Networked Interactions: A Replicator Dynamics Approach [chapter]

Daan Bloembergen, Ipek Caliskanelli, Karl Tuyls
2015 Communications in Computer and Information Science  
Many real-world scenarios can be modelled as multi-agent systems, where multiple autonomous decision makers interact in a single environment. The complex and dynamic nature of such interactions prevents hand-crafting solutions for all possible scenarios, hence learning is crucial. Studying the dynamics of multi-agent learning is imperative in selecting and tuning the right learning algorithm for the task at hand. So far, analysis of these dynamics has been mainly limited to normal form games,
more » ... unstructured populations. However, many multi-agent systems are highly structured, complex networks, with agents only interacting locally. Here, we study the dynamics of such networked interactions, using the well-known replicator dynamics of evolutionary game theory as a model for learning. Different learning algorithms are modelled by altering the replicator equations slightly. In particular, we investigate lenience as an enabler for cooperation. Moreover, we show how well-connected, stubborn agents can influence the learning outcome. Finally, we investigate the impact of structural network properties on the learning outcome, as well as the influence of mutation driven by exploration.
doi:10.1007/978-3-319-18084-7_4 fatcat:lqfhs7socrfgtiz5gfstmkqjca

Preface to the special issue: adaptive and learning agents

Daan Bloembergen, Tim Brys, Logan Yliniemi
2017 Knowledge engineering review (Print)  
The third paper, Environmental Effects on Simulated Emotional and Moody Agents by Joe Collenette, Katie Atkinson, Daan Bloembergen, and Karl Tuyls, studies the effect that simulated emotions and mood have  ... 
doi:10.1017/s0269888917000145 fatcat:t3iqprui5vaqvdewasvhxcddva

Mood modelling within reinforcement learning

Joe Collenette, Katie Atkinson, Daan Bloembergen, Karl Tuyls
2017 Proceedings of the 14th European Conference on Artificial Life ECAL 2017  
., 2008; Bloembergen et al., 2014; Skyrms, 2004; Bolton et al., 2016) . It is for this reason that we adopt this model of interaction in the current work as well.  ... 
doi:10.7551/ecal_a_021 dblp:conf/ecal/CollenetteABT17 fatcat:lkdetd3k6bhjbkavwiqfvm7a6y

Back to Basics: Deep Reinforcement Learning in Traffic Signal Control [article]

Sierk Kanis, Laurens Samson, Daan Bloembergen, Tim Bakker
2021 arXiv   pre-print
In this paper we revisit some of the fundamental premises for a reinforcement learning (RL) approach to self-learning traffic lights. We propose RLight, a combination of choices that offers robust performance and good generalization to unseen traffic flows. In particular, our main contributions are threefold: our lightweight and cluster-aware state representation leads to improved performance; we reformulate the Markov Decision Process (MDP) such that it skips redundant timesteps of yellow
more » ... , speeding up learning by 30%; and we investigate the action space and provide insight into the difference in performance between acyclic and cyclic phase transitions. Additionally, we provide insights into the generalisation of the methods to unseen traffic. Evaluations using the real-world Hangzhou traffic dataset show that RLight outperforms state-of-the-art rule-based and deep reinforcement learning algorithms, demonstrating the potential of RL-based methods to improve urban traffic flows.
arXiv:2109.07180v2 fatcat:z4gtzzaucndzlouvi3m5osubu4

Evolutionary Dynamics of Multi-Agent Learning: A Survey

Daan Bloembergen, Karl Tuyls, Daniel Hennes, Michael Kaisers
2015 The Journal of Artificial Intelligence Research  
For example, this allows us to study the evolutionary dynamics of various trading strategies in stock markets Hennes, Bloembergen, Kaisers, Tuyls, & Parsons, 2012; Bloembergen, Hennes, McBurney, & Tuyls  ...  The Value of Information in Markets As an example, consider a market in which differently informed traders bid for a certain asset (Bloembergen et al., 2015) .  ... 
doi:10.1613/jair.4818 fatcat:6wqvs63nezd6xfz3zf7c4cattq

On Rational Delegations in Liquid Democracy [article]

Daan Bloembergen, Davide Grossi, Martin Lackner
2018 arXiv   pre-print
Liquid democracy is a proxy voting method where proxies are delegable. We propose and study a game-theoretic model of liquid democracy to address the following question: when is it rational for a voter to delegate her vote? We study the existence of pure-strategy Nash equilibria in this model, and how group accuracy is affected by them. We complement these theoretical results by means of agent-based simulations to study the effects of delegations on group's accuracy on variously structured social networks.
arXiv:1802.08020v4 fatcat:2xu5o2elgbdldmz5e2vm3zle3i

Evolutionary advantage of foresight in markets

Daniel Hennes, Daan Bloembergen, Michael Kaisers, Karl Tuyls, Simon Parsons
2012 Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference - GECCO '12  
We analyze the competitive advantage of price signal information for traders in simulated double auctions. Previous work has established that more information about the price development does not guarantee higher performance. In particular, traders with limited information perform below market average and are outperformed by random traders; only insiders beat the market. However, this result has only been shown in markets with a few traders and a uniform distribution over information levels. We
more » ... present additional simulations of several more realistic information distributions, extending previous findings. In addition, we analyze the market dynamics with an evolutionary model of competing information levels. Results show that the highest information level will dominate if information comes for free. If information is costly, less-informed traders may prevail reflecting a more realistic distribution over information levels.
doi:10.1145/2330163.2330294 dblp:conf/gecco/HennesBKTP12 fatcat:c4tkdf7yrralvbkoppwwcnhdy4

Space Debris Removal: Learning to Cooperate and the Price of Anarchy

Richard Klima, Daan Bloembergen, Rahul Savani, Karl Tuyls, Alexander Wittig, Andrei Sapera, Dario Izzo
2018 Frontiers in Robotics and AI  
., Bloembergen et al. (2015) ], this type of analysis fall outside the scope of our current study. In this work we study the inefficiency of decentralised solution in the active debris removal.  ...  observe the state and the Q-values are independent on the other player action, thus this learning can be seen as independent Q-learning, which is a common method in multi-agent reinforcement learning (Bloembergen  ... 
doi:10.3389/frobt.2018.00054 pmid:33500936 pmcid:PMC7806007 fatcat:o7sjiqi6xvcgnfd7n6g5vv2ski
« Previous Showing results 1 — 15 out of 35 results