5,403 Hits in 6.5 sec

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization [article]

Xiangxiang Chu
2019 arXiv   pre-print
In this paper, a first-order gradient reinforcement learning algorithm called Policy Optimization with Penalized Point Probability Distance (POP3D), which is a lower bound to the square of total variance  ...  As the most successful variant and improvement for Trust Region Policy Optimization (TRPO), proximal policy optimization (PPO) has been widely applied across various domains with several advantages: efficient  ...  Conclusion In this paper, we introduce a new reinforcement learning algorithm called POP3D (Policy Optimization with Penalized Point Probability Distance), which acts as a TRPO variant like PPO.  ... 
arXiv:1807.00442v4 fatcat:amvxdq2htvfxbaemqgmscse3ta

Distances and Small Business Credit Constraints: The French Case

Salima Djedidi
2010 Social Science Research Network  
Acknowledgments I am grateful to Didier Fichaux and Loïc Dorléans (Bank of France) for providing us the banks' location files (FIB) that allows us to build up our interest variables.  ...  An alternative to the Within estimator consists on applying OLS to the model written in first differences.  ...  We build up two alternative functional distance indexes.  ... 
doi:10.2139/ssrn.1695487 fatcat:yw6kwdch4vczfng7kilwlutk5y

Banks, Distances and Financing Constraints for Firms

Pietro Alessandrini, Andrea Filippo Presbitero, Alberto Zazzaro
2006 Social Science Research Network  
Our findings on Italian data show that increased functional distance makes local borrowers' financing constraints more binding, it being positively associated with the probability of credit rationing,  ...  with investment-cash flow sensitivity, with the ratio of credit lines used by borrowers to credit lines available and negatively associated with the scope for overdrawing.  ...  Distance variables Coming to our key distance variables, we find strong evidence showing that functional distance and operational proximity have opposite effects on the probability of being rationed.  ... 
doi:10.2139/ssrn.928826 fatcat:pumdnmjw2zhypdbyrwv677aksq

How Haptic Size Sensations Improve Distance Perception

Peter W. Battaglia, Daniel Kersten, Paul R. Schrater, Konrad P. Körding
2011 PLoS Computational Biology  
We compare these models' predictions to a set of human distance judgments in an interception experiment and use Bayesian analysis tools to quantitatively select the best hypothesis on the basis of its  ...  Determining distances to objects is one of the most ubiquitous perceptual tasks in everyday life.  ...  An alternative interpretation of k is that it is an exponent applied to the posterior distribution, from which one sample is then drawn after renormalizing.  ... 
doi:10.1371/journal.pcbi.1002080 pmid:21738457 pmcid:PMC3127804 fatcat:z3airvbshfaulm67vmesrkwfom

The Importance of Pessimism in Fixed-Dataset Policy Optimization [article]

Jacob Buckman, Carles Gelada, Marc G. Bellemare
2020 arXiv   pre-print
This analysis reveals that for naive approaches, the possibility of erroneous value overestimation leads to a difficult-to-satisfy requirement: in order to guarantee that we select a policy which is near-optimal  ...  To avoid this, algorithms can follow the pessimism principle, which states that we should choose the policy which acts optimally in the worst possible world.  ...  Data is collected with an optimal -greedy policy, with = 50%.  ... 
arXiv:2009.06799v3 fatcat:ke7gilqo2zgepimkksffkbh5gy

Adaptive Exploration through Covariance Matrix Adaptation Enables Developmental Motor Learning

Freek Stulp, Pierre-Yves Oudeyer
2012 Paladyn: Journal of Behavioral Robotics  
AbstractThe "Policy Improvement with Path Integrals" (PI  ...  An alternative to discrete state and action spaces is use a parameterized policy π(θ), and search directly in the space of the parameters θ to find the optimal policy π(θ * ).  ...  For example, φ N may penalize the distance to a goal at the end of a movement, and may penalize the acceleration at each time step during a movement.  ... 
doi:10.2478/s13230-013-0108-6 fatcat:o6tvibn7tjgltdxledauf2heqm

Going the distance for TLB prefetching

Gokul B. Kandiraju, Anand Sivasubramaniam
2002 SIGARCH Computer Architecture News  
In addition, this paper proposes a novel prefetching mechanism, called Distance Prefetching, that attempts to capture patterns in the reference behavior in a smaller space than earlier proposals.  ...  body of literature on prefetching for caches, and it is not clear how they can be adapted (or if the issues are different) for TLBs, how well suited they are for TLB prefetching, and how they compare with  ...  The contributions of this paper are in: (a) the novel mechanism -Distance Prefetching -that can be used to predict application reference behavior using a relatively small space (which can possibly be used  ... 
doi:10.1145/545214.545237 fatcat:ahqfric4affpjdgru2jvvlob4e

Enabling and Emerging Technologies for Social Distancing: A Comprehensive Survey [article]

Cong T. Nguyen, Yuris Mulya Saputra, Nguyen Van Huynh, Ngoc-Tan Nguyen, Tran Viet Khoa, Bui Minh Tuan, Diep N. Nguyen, Dinh Thai Hoang, Thang X. Vu, Eryk Dutkiewicz, Symeon Chatzinotas, Bjorn Ottersten
2020 arXiv   pre-print
These technologies open many new solutions and directions to deal with problems in social distancing, e.g., symptom prediction, detection and monitoring quarantined people, and contact tracing.  ...  To that end, we provide a fundamental background of social distancing including basic concepts, measurements, models and propose practical scenarios.  ...  Then, an optimization problem is solved to find the effective distances corresponding to the paths that close to the actual paths from the tag to the receiver.  ... 
arXiv:2005.02816v1 fatcat:3gk7vy5k5ravzceb3c2t23e74e

Efficient Wasserstein Natural Gradients for Reinforcement Learning [article]

Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton
2021 arXiv   pre-print
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL).  ...  The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization.  ...  Acknowledgments The authors would like to thank Jack Parker-Holder for sharing his code for BGPG and BGES, as well as colleagues at Gatsby for useful discussions.  ... 
arXiv:2010.05380v4 fatcat:7bhuob2ngjfdbfzmn4q44gyxra

Dynamic Integration of Value Information into a Common Probability Currency as a Theory for Flexible Decision Making

Vassilios Christopoulos, Paul R. Schrater, Jill X O'Reilly
2015 PLoS Computational Biology  
It is comprised of a series of control schemes with each of them attached to an individual goal, generating an optimal action-plan to achieve that goal starting from the current state.  ...  We model the computations underlying dynamic decision-making with disparate value types, using the probability of getting the highest pay-off with the least effort as a common currency that supports goal  ...  It reflects how desirable is to follow the policy π j at that state with respect to the alternatives.  ... 
doi:10.1371/journal.pcbi.1004402 pmid:26394299 pmcid:PMC4578920 fatcat:7l3ljmvbwjagdnsinactvjdjc4

Combining Benefits from Trajectory Optimization and Deep Reinforcement Learning [article]

Guillaume Bellegarda, Katie Byl
2019 arXiv   pre-print
(2) providing an upper bound estimate on the time-to-arrival of the combined learned-optimized policy, allowing online policy deployment at any point in the training process by using the TO as a worst-case  ...  This method is evaluated for a car model, with applicability to any mobile robotic system. A video showing policy execution comparisons can be found at .  ...  Proximal Policy Optimization (PPO) [2] .  ... 
arXiv:1910.09667v1 fatcat:plu725ntm5ab5pvjl24tx3nuwq

Entropic Regularization of Markov Decision Processes

Boris Belousov, Jan Peters
2019 Entropy  
Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures.  ...  An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration.  ...  Policy Optimization with Entropic Penalties Following the intuition of REPS, we introduce an f -divergence penalized optimization problem that the learning agent must solve at every policy iteration step  ... 
doi:10.3390/e21070674 pmid:33267388 pmcid:PMC7515171 fatcat:abbyss3gybcadmviumulw5az5u

A Discount-Based Time-of-Use Electricity Pricing Strategy for Demand Response with Minimum Information Using Reinforcement Learning

Alejandro Fraija, Kodjo Agbossou, Nilson Henao, Sousso Kelouwani, Michael Fournier, Sayed Saeed Hosseini
2022 IEEE Access  
With an ensured convergence, the resultant DRA is capable of learning adaptive Time-of-Use (ToU) tariffs and generating near-to-optimal price policies.  ...  In addition, it can avoid mistakenly penalizing users by offering price discounts as an incentive to realize a satisfying multi-agent environment.  ...  ACKNOWLEDGMENT The authors would like to thank the Laboratoire des technologies de l'énergie d'Hydro-Québec, the Natural Science and Engineering Research Council of Canada, and the Foundation of Université  ... 
doi:10.1109/access.2022.3175839 fatcat:tb76wtnlfnh67ko6nlsas3oapm

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation [article]

Jiaxuan You, Bowen Liu, Rex Ying, Vijay Pande, Jure Leskovec
2019 arXiv   pre-print
The model is trained to optimize domain-specific rewards and adversarial loss through policy gradient, and acts in an environment that incorporates domain-specific rules.  ...  This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic accessibility, while obeying  ...  Here we adopt Proximal Policy Optimization (PPO) [35] , one of the state-of-the-art policy gradient methods.  ... 
arXiv:1806.02473v3 fatcat:trljxebrzrdjxacioje35y6qem

Nash Optimal Party Positions: The nopp R Package

Luigi Curini, Stefano M. Iacus
2017 Journal of Statistical Software  
It accommodates alternative motivations in (each) party strategy while allowing to estimate the uncertainty around their optimal positions through two different procedures (bootstrap and MC).  ...  nopp is a package for R which enables to compute party/candidate ideological positions that correspond to a Nash Equilibrium along a one-dimensional space.  ...  With respect to the policy component in U ik , x i and s k are respectively the ideal point of elector i and party k's location on the underlying policy dimension, a describes the weight, or salience,  ... 
doi:10.18637/jss.v081.i11 fatcat:2mgiwizvbrbtpmtatw7ogf7utq
« Previous Showing results 1 — 15 out of 5,403 results