4,004 Hits in 9.2 sec

Generation of ice states through deep reinforcement learning [article]

Kai-Wen Zhao, Wen-Han Kao, Kai-Hsin Wu, Ying-Jer Kao
2019 arXiv   pre-print
Analysis of the trained policy and the state value function indicates that the ice rule and loop-closing condition are learned without prior knowledge.  ...  We present a deep reinforcement learning framework where a machine agent is trained to search for a policy to generate a ground state for the square ice model by exploring the physical environment.  ...  IV, two types of policies are trained by supplying different sets of observations to the agent.  ... 
arXiv:1903.04698v1 fatcat:2lm6gbl2sjb7xpzdjhkasb65ry

Language Support for Multi Agent Reinforcement Learning

Tony Clark, Balbir Barn, Vinay Kulkarni, Souvik Barat
2020 Proceedings of the 13th Innovations in Software Engineering Conference on Formerly known as India Software Engineering Conference  
support for multi agent reinforcement learning.  ...  An open access repository of Middlesex University research Clark, Tony, Barn, Balbir ORCID:, Kulkarni, Vinay and Barat, Souvik (2020) Language  ...  We developed a closed loop multi-agent simulation setup for training a reinforcement learning based control policy.  ... 
doi:10.1145/3385032.3385041 dblp:conf/indiaSE/ClarkBKB20 fatcat:7uwrqzmhafczvipyssh5rv5xzi

Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning

Zhandos Kegenbekov, Ilya Jackson
2021 Algorithms  
These features, complimented with a straightforward supply chain environment, give rise to a general and task unspecific approach to adaptive control in multi-echelon supply chains.  ...  The deep reinforcement learning agent is built upon the Proximal Policy Optimization algorithm, which does not require hardcoded action space and exhaustive hyperparameter tuning.  ...  Conflicts of Interest: The authors declare no conflict of interest. Algorithms 2021, 14, 240  ... 
doi:10.3390/a14080240 fatcat:cpmsftcukngfbfhdvf6ybwlpqm

Cognitive Agents for Sense and Respond Logistics [chapter]

Kshanti Greene, David G. Cooper, Anna L. Buczak, Michael Czajkowski, Jeffrey L. Vagle, Martin O. Hofmann
2006 Lecture Notes in Computer Science  
over the supply chain.  ...  TacAir-SOAR [14] is an expert system-based agent application for automated flight control and battlefield simulation developed using the rule-based, cognitive system SOAR.  ...  Overall value for a policy is the summation of its reinforcement, A , over time. When selecting a policy to use for a given state, usually the policy with the highest overall value is used.  ... 
doi:10.1007/11683704_9 fatcat:tc2swwocibfubcyws3ezspcypy

Deep Reinforcement Learning for Autonomous Driving: A Survey [article]

B Ravi Kiran, Ibrahim Sobh, Victor Talpaert, Patrick Mannion, Ahmad A. Al Sallab, Senthil Yogamani, Patrick Pérez
2021 arXiv   pre-print
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional environments  ...  The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.  ...  The velocity control are based on classical methods of closed loop control such as PID (proportionalintegral-derivative) controllers, MPC (Model predictive control).  ... 
arXiv:2002.00444v2 fatcat:axj3ohhjwzdrxp6dgpfqvctv2i

UAV Autonomous Target Search Based on Deep Reinforcement Learning in Complex Disaster Scene

Chunxue Wu, Bobo Ju, Yan Wu, Xiao Lin, Neal N. Xiong, Guangquan Xu, Hongyan Li, Xuefeng Liang
2019 IEEE Access  
However, most of the existing reinforcement learning is applied in games with only two or three moving directions.  ...  Current drones are quite mature in terms of automation control, but current drones require manual control.  ...  In the policy control of the main area, we use the multi-agent multi-hotspot model, and in the cascaded target area we use a single agent single hotspot model. quickly (find the mean of the reward mechanism  ... 
doi:10.1109/access.2019.2933002 fatcat:ys4niqndlbhzphwko2way46are

Reinforcement Learning in Practice: Opportunities and Challenges [article]

Yuxi Li
2022 arXiv   pre-print
Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn  ...  This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical  ...  For problems like robotics, transportation and supply chain, reinforcement learning needs to outperform optimal control and operations research (significantly) to be (widely) adopted.  ... 
arXiv:2202.11296v2 fatcat:xdtsmme22rfpfn6rgfotcspnhy

Coordination of hydraulic manipulators by reinforcement learning

M. Karpenko, J. Anderson, N. Sepehri
2006 2006 American Control Conference  
A multi-layer reinforcement learning neural network control architecture is designed next to regulate the interaction force during positioning.  ...  Each actuator system is outfitted with such a neural controller so that a decentralized reinforcement learning control system results.  ...  With reference to Fig. 3 , each actuator is positioned using a simple closed-loop proportional control law.  ... 
doi:10.1109/acc.2006.1657214 fatcat:4wbgbbcbgrhrbghhqafjh3ewje

Discrete-Event Simulation-Based Q-Learning Algorithm Applied to Financial Leverage Effect

E. Barbieri, L. Capocchi, J. F Santucci
2019 SN Computer Science  
Discrete-event modeling and simulation and machine learning are two frameworks suited for system of systems modeling which when combined can give a powerful tool for system optimization and decision making  ...  This approach has been validated on a financial leverage effect based on a Markov decision-making policy.  ...  And, simulation can be used to evaluate the impact of introducing AI into a "real world system" such as supply chains or production processes.".  ... 
doi:10.1007/s42979-019-0051-7 fatcat:35vv42qkivgklcekundxg7bkkq

Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems [article]

Hardik Meisheri and Vinita Baniwal and Nazneen N Sultana and Balaraman Ravindran and Harshad Khadilkar
2019 arXiv   pre-print
We first formulate the decision-making problem as a canonical reinforcement learning (RL) problem, which can be solved using purely data-driven techniques.  ...  of a multi-product inventory management task.  ...  Acknowledgment The authors would like to thank Dheeraj Shah and Padmakumar Ma from TCS Retail Strategic Initiatives team, for their help in defining the inventory management use case.  ... 
arXiv:1910.00211v1 fatcat:akkf6mny4vecnnypmmc7oxwctq

A Long-Short Term Memory Recurrent Neural Network Based Reinforcement Learning Controller for Office Heating Ventilation and Air Conditioning Systems

2017 Processes  
In this paper, a model-free actor-critic Reinforcement Learning (RL) controller is designed using a variant of artificial recurrent neural networks called Long-Short-Term Memory (LSTM) networks.  ...  Using the Building Control Virtual Test Bed (BCVTB), the control of the thermostat schedule during each sample time is implemented for the office in EnergyPlus alongside local weather data.  ...  A model-free reinforcement learning-based thermostat schedule controller has been developed using the novel long-short-term memory recurrent neural network, by closed-loop control of the HVAC system for  ... 
doi:10.3390/pr5030046 fatcat:o2gvci6qe5ht5ghmhmg47os2ou

Smart Master Production Schedule for the Supply Chain: A Conceptual Framework

Julio C. Serrano-Ruiz, Josefa Mula, Raúl Poler
2021 Computers  
Risks arising from the effect of disruptions and unsustainable practices constantly push the supply chain to uncompetitive positions.  ...  A smart production planning and control process must successfully address both risks by reducing them, thereby strengthening supply chain (SC) resilience and its ability to survive in the long term.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/computers10120156 fatcat:jnibvdbpmzgkdmcpv6eh6djk6e

Deep Reinforcement Learning-BasedRobust Protection in DER-Rich Distribution Grids [article]

Dongqi Wu, Dileep Kalathil, Miroslav Begovic, Le Xie
2021 arXiv   pre-print
This paper introduces the concept of Deep Reinforcement Learning based architecture for protective relay design in power distribution systems with many distributed energy resources (DERs).  ...  In this paper, a reinforcement learning-based approach is proposed to design and implement protective relays in the distribution grid.  ...  In a single agent system, for any fixed policy of the learning agent, the probability distribution of the observed states can be described using a stationary Markov chain.  ... 
arXiv:2003.02422v3 fatcat:5koxf42bhba4blz27jz2imyjhe

ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions [article]

Brijen Thananjeyan, Ashwin Balakrishna, Ugo Rosolia, Joseph E. Gonzalez, Aaron Ames, Ken Goldberg
2020 arXiv   pre-print
However, prior analysis of LMPC controllers for stochastic systems has mainly focused on linear systems in the iterative learning control setting.  ...  We present results with a practical instantiation of this algorithm and experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 stochastic continuous  ...  [1] propose a practical reinforcement learning algorithm using these strategies to learn policies for nonlinear systems.  ... 
arXiv:2003.01410v2 fatcat:lfnyerc3ove5tjuk2ke4og45bi

A study on closed-loop supply chain model for parts reuse with economic efficiency

Yoshitaka TANIMIZU, Yusuke SHIMIZU
2014 Journal of Advanced Mechanical Design, Systems, and Manufacturing  
A prototype of a simulation system for closed-loop supply chains is developed in order to evaluate the effectiveness of the proposed model and negotiation protocol.  ...  This study proposes a basic model for closed-loop supply chains which includes not only traditional forward supply chains for the generation of products but also reverse supply chains for the reuse and  ...  The model was developed based on Markov decision processes and reinforcement learning to simultaneously design the inventory reorder policies of all the supply chain stages.  ... 
doi:10.1299/jamdsm.2014jamdsm0068 fatcat:itdm63bnabhxrawqd7gylqf2qe
« Previous Showing results 1 — 15 out of 4,004 results