Filters








306,984 Hits in 6.6 sec

Distributional Policy Optimization: An Alternative Approach for Continuous Control [article]

Chen Tessler, Guy Tennenholtz, Shie Mannor
<span title="2019-11-25">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We identify a fundamental problem in policy gradient-based methods in continuous control.  ...  Empirical evaluation shows that our approach is comparable and often surpasses current state-of-the-art baselines in continuous domains.  ...  These insights lead us to question the optimality of current PG approaches in continuous control, suggesting that, although these approaches are well understood, there is room for research into alternative  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.09855v2">arXiv:1905.09855v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rj4qf3e7mfa2fo4tgja76mrlrm">fatcat:rj4qf3e7mfa2fo4tgja76mrlrm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200929140238/https://arxiv.org/pdf/1905.09855v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/34/8b/348ba3ae6bd48dfa66d525075c2e6c39cba7a3f5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.09855v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Proximal Policy Optimization with Mixed Distributed Training [article]

Zhenyu Zhang, Xiangfeng Luo, Tong Liu, Shaorong Xie, Jianshu Wang, Wei Wang, Yang Li, Yan Peng
<span title="2019-09-30">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We introduce an improved algorithm based on proximal policy optimization, mixed distributed proximal policy optimization (MDPPO), and show that it can accelerate and stabilize the training process.  ...  Actions are sampled by each policy separately as usual, but the trajectories for the training process are collected from all agents, instead of only one policy.  ...  BACKGROUND In reinforcement learning, algorithms based on policy gradient provide an outstanding paradigm for continuous action space problems.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1907.06479v3">arXiv:1907.06479v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6gm3ibfvpbftxine3rhpde3raa">fatcat:6gm3ibfvpbftxine3rhpde3raa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200930030650/https://arxiv.org/pdf/1907.06479v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ac/0c/ac0ce14210fd20d580a9c9cacb7eea69d7bf8f14.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1907.06479v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Marginalized State Distribution Entropy Regularization in Policy Optimization [article]

Riashat Islam, Zafarali Ahmed, Doina Precup
<span title="2019-12-11">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Our approach based on marginal state distribution achieves superior state space coverage on complex gridworld domains, that translate into empirical gains in sparse reward 3D maze navigation and continuous  ...  control domains compared to entropy regularization with stochastic policies.  ...  Acknowledgements We thank Anirudh Goyal for preliminary discussions regarding this work.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.05128v1">arXiv:1912.05128v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7hxf76irj5hrracs7ib7pdugii">fatcat:7hxf76irj5hrracs7ib7pdugii</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200911165040/https://arxiv.org/pdf/1912.05128v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/8b/55/8b559ea2c4abe9bbca4b5d2820dbb9cbeb37d510.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.05128v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Distributional View on Multi-Objective Policy Optimization [article]

Abbas Abdolmaleki, Sandy H. Huang, Leonard Hasenclever, Michael Neunert, H. Francis Song, Martina Zambelli, Murilo F. Martins, Nicolas Heess, Raia Hadsell, Martin Riedmiller
<span title="2020-05-15">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions.  ...  We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the  ...  Guillaume Desjardins, Csaba Szepesvari, Jost Tobias Springenberg, Steven Bohez, Philemon Brakel, Brendan Tracey, Jonas Degrave, Jonas Buchli, Leslie Fritz, Chloe Rosenberg, and many others at DeepMind for  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.07513v1">arXiv:2005.07513v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/35wpieb23rgwnbaikhw35b6rmy">fatcat:35wpieb23rgwnbaikhw35b6rmy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200519014009/https://arxiv.org/pdf/2005.07513v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.07513v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution [article]

Nihat Engin Toklu, Paweł Liskowski, Rupesh Kumar Srivastava
<span title="2020-12-08">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Distribution-based search algorithms are an effective approach for evolutionary reinforcement learning of neural network controllers.  ...  In these algorithms, gradients of the total reward with respect to the policy parameters are estimated using a population of solutions drawn from a search distribution, and then used for policy optimization  ...  I We propose a simple and competitive optimizer (an adaptive gradient following mechanism) for use within distribution-based evolutionary search algorithms for training reinforcement learning (RL) agents  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.02387v3">arXiv:2008.02387v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/56babe5ccjfivmyar4red3qnc4">fatcat:56babe5ccjfivmyar4red3qnc4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201209055525/https://arxiv.org/pdf/2008.02387v2.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/98/cc/98ccf0436fa5b32975cd3c2eb229558e58e28010.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.02387v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optimal Order and Distribution Strategies in Production Networks [chapter]

Simone Göttlich, Michael Herty, Christian Ringhofer
<span title="">2012</span> <i title="Springer London"> Decision Policies for Production Networks </i> &nbsp;
They need to be fixed by solving an optimization problem where additionally economic factors such as production and inventory costs, liquidity and credit limits influence the maximization of profit.  ...  Instead of using a simulation-based optimization procedure, we derive an efficient way to transform the original model into a mixedinteger programming problem and benefit from established commercial solvers  ...  An alternative modeling approach which remedies the computational aspect are differential equations.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-0-85729-644-3_11">doi:10.1007/978-0-85729-644-3_11</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zlylq5igj5fxrdvyzrl7ijrfka">fatcat:zlylq5igj5fxrdvyzrl7ijrfka</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20110331214130/http://math.la.asu.edu/~chris/SpringerOpt10.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/91/d9/91d958a73c57c11098464eaecc1dbbd4e2aef8ed.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-0-85729-644-3_11"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift [article]

Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
<span title="2020-10-14">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering  ...  both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results.  ...  We also acknowledge numerous helpful comments from Ching-An Cheng and Andrea Zanette on an earlier draft of this work.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.00261v5">arXiv:1908.00261v5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/y4elqflzebgy7bwl7rbkt3xmb4">fatcat:y4elqflzebgy7bwl7rbkt3xmb4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201017002806/https://arxiv.org/pdf/1908.00261v5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.00261v5" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Bayesian Distributional Policy Gradients [article]

Luchen Li, A. Aldo Faisal
<span title="2021-03-23">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general.  ...  We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions.  ...  Acknowledgments We are grateful for our funding support: a Department of Computing PhD Award to LL and a UKRI Turing AI Fellowship (EP/V025449/1) to AAF.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.11265v2">arXiv:2103.11265v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/f5ykig2iqvbixblazqbsamo3nu">fatcat:f5ykig2iqvbixblazqbsamo3nu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210329122531/https://arxiv.org/pdf/2103.11265v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/db/5e/db5ec14be596e3c52cd2fa29473927cb70239973.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.11265v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optimal object state transfer - recovery policies for fault tolerant distributed systems

P. Katsaros, C. Lazos
<span title="">2004</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/64vpoz5fx5azda553iibjuf7h4" style="color: black;">International Conference on Dependable Systems and Networks, 2004</a> </i> &nbsp;
Our approach allows mixing miscellaneous fault tolerance policies, as opposed to the published analytic models, which are best suited in the evaluation of single-server process replication schemes.  ...  Recent developments in the field of object-based fault tolerance and the advent of the first OMG FT-CORBA compliant middleware raise new requirements for the design process of distributed fault-tolerant  ...  Kalbarczyk and the anonymous referees for their helpful comments. References  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/dsn.2004.1311947">doi:10.1109/dsn.2004.1311947</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/dsn/KatsarosL04.html">dblp:conf/dsn/KatsarosL04</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nud6uqzgsvfulfruvi7qsusl4e">fatcat:nud6uqzgsvfulfruvi7qsusl4e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20140611172923/http://www.researchgate.net/profile/Panagiotis_Katsaros/publication/4080182_Optimal_object_state_transfer_-_recovery_policies_for_fault_tolerant_distributed_systems/file/d912f50b51907dbaf6.pdf?origin=publication_detail" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f9/4f/f94f918ec4b7a1868ff3649dc8889b04fed9c4d3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/dsn.2004.1311947"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

An optimal inventory policy under certainty distributed demand for cutting tools with stochastically distributed lifespan

Cun Rong Li, Jiadong Cheng, Zude Zhou
<span title="2015-01-09">2015</span> <i title="Informa UK Limited"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/3traqp2nfrhnjbxyafpoa3gk6i" style="color: black;">Cogent Engineering</a> </i> &nbsp;
An optimal inventory policy with general demand (OIPGD) was developed with which the allowable stopping time for tools, order-up-to-level inventory, and order cycle can be optimally determined by an exhaustive  ...  Traditional inventory policy was deeply investigated for various kinds of demand in different industrial sectors.  ...  Timothy (2005) researched two types of inventory control model through the use of a simple periodic review model and also an alternative approach to sensitivity analysis of the model.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1080/23311916.2014.990671">doi:10.1080/23311916.2014.990671</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/x2zdnk3oqnadrb4urw35x6kn4a">fatcat:x2zdnk3oqnadrb4urw35x6kn4a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200224201900/https://www.cogentoa.com/article/10.1080/23311916.2014.990671.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1080/23311916.2014.990671"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> tandfonline.com </button> </a>

Off-Policy Policy Gradient with State Distribution Correction [article]

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
<span title="2019-07-06">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Here we build on recent progress for estimating the ratio of the state distributions under behavior and evaluation policies for policy evaluation, and present an off-policy policy gradient optimization  ...  We present an illustrative example of why this is important and a theoretical convergence guarantee for our approach.  ...  In cart pole since an optimal policy can control the cart to stay in a small region, it is relatively easy for the uniform random policy to cover the states visited by the optimal policy.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.08473v2">arXiv:1904.08473v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dynblg47ezekblujbtqekjpyda">fatcat:dynblg47ezekblujbtqekjpyda</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191027151752/https://arxiv.org/pdf/1904.08473v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3f/5a/3f5afd9e81657d4f8fac873c34c2e3d1311df9d6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1904.08473v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optimal policies for electromobility: Joint assessment of transport and electricity distribution costs in Norway

Paal Brevik Wangsness, Stef Proost, Kenneth Løvold Rødseth
<span title="">2021</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/utr47f6jmjcillsbjti37sn5ge" style="color: black;">Utilities Policy</a> </i> &nbsp;
We examine how the distribution system operators can mitigate these costs with different pricing schemes and how this, in turn, affects the transport market equilibrium.  ...  A R T I C L E I N F O JEL classification: This paper develops a stylized economic model for passenger transport in the greater Oslo area, in which the agents' choices of car ownership, transport pattern  ...  Acknowledgements We thank for valuable comments and insights given by Knut Einar Rosendahl, Frode Mycklebye, Tor Westby Stålsett and Kjersti Vøllestad.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.jup.2021.101247">doi:10.1016/j.jup.2021.101247</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5kuf66hlvnef3jrhrpsy64nk7m">fatcat:5kuf66hlvnef3jrhrpsy64nk7m</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210702235305/https://pdf.sciencedirectassets.com/271922/1-s2.0-S0957178721X00057/1-s2.0-S0957178721000813/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAgaCXVzLWVhc3QtMSJGMEQCIEvLROTAQT1TLKImTxW3DtQnlxTLBt%2Fo9n0vNT7%2BD3SAAiBl1clXYVnXrSncYsH0xb7S8UmDNxQW%2Fgxhgs1tmI%2F%2FhCqDBAjh%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAQaDDA1OTAwMzU0Njg2NSIM4mfD8UFjsmGUsQq3KtcDTxk9vUHozN9%2B9hf8kIn1z6Ob72bfwVGumC5%2BWol%2F0xrtX9uDDX%2F5Jjl4D0RgYu2appHpZDaeooJp4Jbl7BLRUon%2Bp5P0q0dE5dfDFhhuWKK%2FhZ4VfjYQgkP070tpvjWOC3XkCJWSptn9AQIgi1nq%2BjxkvUgVBUHEk2v2a4IJ26dFhTBnXpKW9bwR7PyFJ%2FlHY87QEbHtvnS444HMpnOkFI4eg6NStdBprq90fziKToXPx6Lon03uQ4scvazKsepTUJqYH%2Ff19wviP6BFgwtYOE7wE0XB%2BQoMY22i9os362cpDHteKHfh7nHK5ywgkpsHBpROmtEQKBaruR2UV3bOCud8SeiyfnOwN%2FpxqlX6OrUrcCx3O5EnCaSqy3PErTkAn8AOn2oIcmr8WBY1dTGDO%2B7ITtvrzu%2BmbjRtsCiT6i3JoEcUZ%2BI%2BN2jAVV9T5u1R2bqzt4hwPb3WjX4s7qty8hYMx%2B4edcB%2BhDMSpDUX2q83uU%2BB4fcHF64BjpyToMGwiF%2F1MA42fedpZeA9XWSfdZnAt5zobV2DIWbI9FF%2BatU1pFpqlNarExgn6WubRWq3FS8GH5OoHdl3MlgBYLcPSgb0%2FzK9xhoi9DTEzIjlUTbMf0rfywj9MOvG%2FoYGOqYBr78Hzsf2OcXGR4M6dIfEeKa6R38baDBqaG7bli0ketEvyTAuouWS5Qtu5tkxDVZ%2BjkndtMob%2FU29%2FikFYP4WZVecoJ4vA3efR5zDmMFfByXWO3fmfDwKXnmxOUwNp2FxtTsGBChwS2Iu4Wz5O2MwYJBZf6KP5D4sNlqDrUSUuh3N9gvxMIXq8ILO8spU8q7rPSD0Fe5Nh73tCVfhoe9QrB1ithgPnA%3D%3D&amp;X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Date=20210702T235257Z&amp;X-Amz-SignedHeaders=host&amp;X-Amz-Expires=300&amp;X-Amz-Credential=ASIAQ3PHCVTYZ32Q2ROH%2F20210702%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Signature=3a7c28e4512a62975986ac9642d60fb340e909ad80abcec21c40acd44f5a718f&amp;hash=a0cdb12a7df791f4995da747e7085496bcc420ae7cfd31d889a445d2f16c710f&amp;host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&amp;pii=S0957178721000813&amp;tid=spdf-15655bb5-8336-4f97-b293-19722f2f5116&amp;sid=2cad7a8d55cf0044480967b699ca4c275387gxrqa&amp;type=client" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a9/3d/a93d3b6d3aa1337b054abb475ac5caee255922c2.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.jup.2021.101247"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> elsevier.com </button> </a>

Sample-based Distributional Policy Gradient [article]

Rahul Singh, Keuntaek Lee, Yongxin Chen
<span title="2020-01-08">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, motivated by applications in robotics with continuous action space control settings, we propose sample-based distributional policy gradient (SDPG) algorithm.  ...  We compare SDPG with the state-of-art policy gradient method in DRL, distributed distributional deterministic policy gradients (D4PG), which has demonstrated state-of-art performance.  ...  Alternative to value function based approaches, policy gradient methods improve a parameterized policy based on the policy gradient theorem [6] and has shown to be more effective in continuous action  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2001.02652v1">arXiv:2001.02652v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/inzz444jlbgntlyakqzernfsdi">fatcat:inzz444jlbgntlyakqzernfsdi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321054113/https://arxiv.org/pdf/2001.02652v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2001.02652v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Inventory optimization of distribution networks with discrete-event processes by vendor-managed policies

Simona Sacone, Silvia Siri
<span title="">2011</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/p3wtlm3j2fedxitkuu4bs3scau" style="color: black;">IFAC Proceedings Volumes</a> </i> &nbsp;
The objective of this work is to study optimal Vendor-Managed Inventory policies in distributions systems in which demand and delivery processes are characterized by a discreteevent dynamics.  ...  For this problem different solution methods are proposed in the paper, based on the statement of mathematical programming problems and the definition of appropriate algorithms.  ...  INTRODUCTION This paper deals with distribution systems characterized by Vendor-Managed Inventory (VMI) policies.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3182/20110828-6-it-1002.01675">doi:10.3182/20110828-6-it-1002.01675</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nt4r4ksivffihirauisfr5drh4">fatcat:nt4r4ksivffihirauisfr5drh4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170830073811/http://folk.ntnu.no/skoge/prost/proceedings/ifac11-proceedings/data/html/papers/1675.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a5/d8/a5d85f58bf7ae50a699301ff96dff5235381f702.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3182/20110828-6-it-1002.01675"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch [article]

Shangtong Zhang, Remi Tachet, Romain Laroche
<span title="2021-11-04">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Our work goes beyond existing works on the optimality of policy gradient methods in that existing works use the exact policy gradient for updating the policy parameters while we use an approximate and  ...  state distribution of the behavior policy and that of the target policy.  ...  The goal of control is to find a policy π * such that ∀π, s v π (s) ≤ v π * (s). (2) One common approach for control is policy gradient.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.02997v1">arXiv:2111.02997v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/vltn2zg2snfozabivqx4ccaw5u">fatcat:vltn2zg2snfozabivqx4ccaw5u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211116203521/https://arxiv.org/pdf/2111.02997v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/64/30/6430efc38f62d39f51f2754d3abb387e8e75a854.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.02997v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 306,984 results