1,708 Hits in 4.5 sec

A Survey of Deep Reinforcement Learning in Video Games [article]

Kun Shao, Zhentao Tang, Yuanheng Zhu, Nannan Li, Dongbin Zhao
2019 arXiv   pre-print
In this paper, we survey the progress of DRL methods, including value-based, policy gradient, and model-based algorithms, and compare their main techniques and properties.  ...  We also take a review of the achievements of DRL in various video games, including classical Arcade games, first-person perspective games and multi-agent real-time strategy games, from 2D to 3D, and from  ...  Asynchronous advantage actor-critic (A3C) trains several agents on multiple environments, showing a stabilizing effect on training.  ... 
arXiv:1912.10944v2 fatcat:fsuzp2sjrfcgfkyclrsyzflax4

Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training [article]

Piyush K. Sharma, Rolando Fernandez, Erin Zaroukian, Michael Dorothy, Anjon Basak, Derrik E. Asher
2021 arXiv   pre-print
The goal is to explore how different implementations of information sharing mechanism in centralized learning may give rise to distinct group coordinated behaviors in multi-agent systems performing cooperative  ...  Two popular Advantage function-based actor-critic methods are Advantage Actor-Critic (A2C) 33 and Asynchronous Advantage Actor-Critic (A3C). 34 A3C consists of multiple independent agents (networks  ...  In Section 2, We discuss actor-critics method and how A2C and A3C based implementations can improve it.  ... 
arXiv:2107.14316v1 fatcat:n7qmmwwdenfbdngkmzflsqcx7y

Behaviors of Actors in a Resource-Exchange Model of Geopolitics [chapter]

Curtis S. Cooper, Walter E. Beyeler, Jacob A. Hobbs, Michael D. Mitchell, Z. Rowan Copley, Matthew Antognoli
2013 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering  
The model is based on resource exchange as the fundamental interaction between agents.  ...  To this end, we have developed a hierarchical behavioral module, based on an extension of the proven ATLANTIS architecture, in order to provide flexible decision-making algorithms to agents.  ...  Rational-actor models assume intelligent agents always pursue their own interests, or what they believe to be their own interests based on past experiences, limited only by their imperfect ability to predict  ... 
doi:10.1007/978-3-319-03473-7_7 fatcat:fprf4ykmjfbhpboxyqexcp3cti

Reinforcement Learning in Dynamic Task Scheduling: A Review

Chathurangi Shyalika, Thushari Silva, Asoka Karunananda
2020 SN Computer Science  
This review paper is about a research study that focused on Reinforcement Learning techniques that have been used for dynamic task scheduling.  ...  The paper addresses the results of the study by means of the state-of-theart on Reinforcement learning techniques used in dynamic task scheduling and a comparative review of those techniques.  ...  ), A3C (Asynchronous Advantage Actor-Critic) [1, 11, 12] .  ... 
doi:10.1007/s42979-020-00326-5 fatcat:egp6vgpetbcwdasm45vunmo3n4

Simulation modeling framework for uncovering system behaviors in the biofuels supply chain network

Datu B Agusdinata, Seokcheon Lee, Fu Zhao, Wil Thissen
2014 Simulation (San Diego, Calif.)  
This SC network model is characterized by distributed control, time asynchrony, and resource contention among actors who interact in collaborative and competitive mode and who make decision based on incomplete  ...  It is very sensitive to the time delay parameters that partly influence the quality of information on which actors' decision are based.  ...  One big challenge for supply chain actors is to achieve an optimal group strategy on the basis of individual actor strategies. This work is intended to support this kind of analysis.  ... 
doi:10.1177/0037549714544081 fatcat:z2zeg6p4z5ci7evtklj5hnoqp4

Multiagent Deep Reinforcement Learning: Challenges and Directions Towards Human-Like Approaches [article]

Annie Wong, Thomas Bäck, Anna V. Kononova, Aske Plaat
2021 arXiv   pre-print
execution, opponent modelling, communication, efficient coordination, and reward shaping.  ...  Dealing with multiple agents is inherently more complex as (a) the future rewards depend on the joint actions of multiple players and (b) the computational complexity of functions increases.  ...  Popular actor-critic methods include Advantage Actor-Critic (A2C) (Sutton et al., 1998) and Asynchronous Advantage Actor-Critic (A3C) .  ... 
arXiv:2106.15691v1 fatcat:7sy6cianq5dh5a7n6clvjdlrxy

A Survey and Critique of Multiagent Deep Reinforcement Learning [article]

Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor
2019 arXiv   pre-print
(iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands).  ...  We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent  ...  article, to Frans Oliehoek, Sam Devlin, Marc Lanctot, Nolan Bard, Roberta Raileanu, Angeliki Lazaridou, and Yuhang Song for clarifications in their areas of expertise, to Baoxiang Wang for his suggestions on  ... 
arXiv:1810.05587v2 fatcat:h4ei5zx2xfa7xocktlefjrvef4

Artificial Intelligence for Prosthetics - challenge solutions [article]

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty, Jennifer Hicks, Sean F. Carroll, Bo Zhou, Hongsheng Zeng, Fan Wang, Rongzhong Lian, Hao Tian, Wojciech Jaśkowski, Garrett Andersen, Odd Rune Lykkebø, Nihat Engin Toklu (+31 others)
2019 arXiv   pre-print
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity  ...  Asynchronous DDPG with multiple actor-critics Mattias Ljungström An asynchronous DDPG [27, 43] algorithm is setup with multiple actor-critic pairs.  ...  Fig. 9 : 9 Actor-Critic pairs and reward penalties Fig. 10 : 10 plot of initial model tested on 60 seeds.  ... 
arXiv:1902.02441v1 fatcat:hf7xzitrhjdqfb5cfaneovlfa4

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

Xuanchen Xiang, Simon Foo
2021 Machine Learning and Knowledge Extraction  
The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning.  ...  The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems  ...  Advantage Actor-Critic algorithms have two main variants: the Asynchronous Advantage Actor-Critic (A3C) and the Advantage Actor-Critic (A2C) A3C was introduced in [24] , which implements parallel training  ... 
doi:10.3390/make3030029 fatcat:u3y7bqkoljac5not2eq5konnnm

Deep Reinforcement Learning for Cyber Security [article]

Thanh Thi Nguyen, Vijay Janapa Reddi
2020 arXiv   pre-print
Extensive discussions and future research directions on DRL-based cyber security are also given.  ...  strategies against cyber attacks.  ...  The actor attempts to learn a policy by receiving feedback from the critic. This iterative process helps the actor improve its strategy and converge to an optimal policy.  ... 
arXiv:1906.05799v3 fatcat:h4lujrwb5bgwngbi4xf6w347b4

Designing Enterprise Decisional System with Agent Based System

Belkadi Abdelhaq, El Fazziki Abdelaziz, Ait Ouahman Abdallah
2010 International Journal of Intelligent Computing Research  
As a solution, we suggest a multi-agent approach for the DS modelling. This approach is based on a referential framework to analyse the requirements of strategic alignment and agent technology.  ...  Enterprises are increasingly information centric, and recent trends reveal that most competitive businesses require more agility in their enterprise information system and decisional system to strategically  ...  Guaranteeing the quality of the present information in the data warehouse, the extraction step is critical: an analysis performed on erroneous or incomplete data can distort the enterprise strategy.  ... 
doi:10.20533/ijicr.2042.4655.2010.0011 fatcat:tfi2y36aujd3xm6bnbvaagmmlm

Electronic Integration and Business Network Redesign: A Roles–Linkage Perspective

Ajit Kambil, James E. Short
1994 Journal of Management Information Systems  
Venkatraman, Jeffrey Sampler, Henry Lucas, Jack Baroudi and several anonymous reviewers for their very helpful comments on earlier drafts of this paper.  ...  We also thank the Center for Information Systems Research, MIT Sloan School of Management, and the Center for Research in Information Management, London Business School for their funding of parts of this  ...  set of firm strategies for competitive advantage.  ... 
doi:10.1080/07421222.1994.11518020 fatcat:dpvrzfx7g5e4vgcougotxbqtj4

The Role of New Technologies in Value Co-Creation Processes: Healthcare Management and the National Health System as a System of Services

Armando Masucci, Antonietta Megaro, Carlo Alessandro Sirianni
2021 Journal of Service Science and Management  
Results allow the conceptualization of the enabling factors for co-creation management and guide managers to identify tools, resources, and technologies that promote value co-creation by improving understanding  ...  Health System as a complex service system and proposes a conceptual framework, then applied to the case study to identify: 1) the enabling factors for value co-creation; 2) the network mechanisms and the information  ...  , but also to external competitive programs based on a simultaneous involvement wide and diversified of both structures.  ... 
doi:10.4236/jssm.2021.142012 fatcat:khbm3oswqzc5hfbwj7kkq3mbpi

Strategic robustness in bi-level system-of-systems design

Jordan L. Stern, Ambrosio Valencia-Romero, Paul T. Grogan
2022 Design Science  
Models are constructed on small world, preferential attachment and random graph topologies and executed in batch simulations.  ...  Systems-of-systems, which lack centralized managerial control, are vulnerable to strategic uncertainty from coordination failures between partially or completely independent system actors.  ...  Acknowledgments This material is based upon work supported in part by the National Science Foundation under Grant No. 1943433.  ... 
doi:10.1017/dsj.2022.2 fatcat:hpfapq3pn5bgxekefoqlqb4g3y

Deep Learning for Video Game Playing [article]

Niels Justesen, Philip Bontrager, Julian Togelius, Sebastian Risi
2019 arXiv   pre-print
In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy  ...  ACKNOWLEDGEMENTS We thank the numerous colleagues who took the time to comment on drafts of this article, including Chen  ...  The Asynchronous Advantage Actor-Critic (A3C) algorithm is an actor-critic method that uses several parallel agents to collect experiences that all asynchronously update a global actor-critic network.  ... 
arXiv:1708.07902v3 fatcat:f3bp2y3khbhqhm5cm3hfqwdtna
« Previous Showing results 1 — 15 out of 1,708 results