Filters








72 Hits in 6.7 sec

On the Application of Danskin's Theorem to Derivative-Free Minimax Optimization [article]

Abdullah Al-Dujaili, Shashank Srikant, Erik Hemberg, Una-May O'Reilly
2018 arXiv   pre-print
Motivated by Danskin's theorem, gradient-based methods have been applied with empirical success to solve minimax problems that involve non-convex outer minimization and non-concave inner maximization.  ...  Based on our experiments, our method's performance is comparable with its coevolutionary counterparts and favorable for high-dimensional problems.  ...  Over generations, the algorithm aims to maximize the expected fitness value E x∼N(µ,σ 2 I ) [f (x)] with respect to the distribution's mean µ via stochastic gradient ascent using a population size of λ  ... 
arXiv:1805.06322v1 fatcat:su243ucrwnfzxl6hvcyp66zof4

Frequency-Domain Representation of First-Order Methods: A Simple and Robust Framework of Analysis [article]

Ioannis Anagnostides, Ioannis Panageas
2021 arXiv   pre-print
On the applications' side, we focus on the Optimistic Gradient Descent (OGD) method, which augments the standard Gradient Descent with an additional past-gradient in the optimization step.  ...  Moreover, our frequency-domain framework provides an exact quantitative comparison between simultaneous and alternating updates of OGD.  ...  Bailey, Gauthier Gidel, and Georgios Piliouras. Finite regret and cycles with fixed step-size via alternating gradient descent-ascent.  ... 
arXiv:2109.04603v2 fatcat:bpr7zr4ybjavze5fapqthlg5gu

Constants of Motion: The Antidote to Chaos in Optimization and Game Dynamics [article]

Georgios Piliouras, Xiao Wang
2021 arXiv   pre-print
, multiplicative weights update, alternating gradient descent and manifold gradient descent) both in optimization as well as in game settings.  ...  We show how proving the existence of invariant functions, i.e., constant of motions, is a fundamental contribution in this direction and establish a plethora of such positive results (e.g. gradient descent  ...  For Gradient Descent, MWU and Manifold Gradient Descent with small step size, there are many invariant functions on a open dense subset of the phase space. Theorem 2 . 1 ( 21 Taylor's Theorem).  ... 
arXiv:2109.03974v2 fatcat:4ilx3u3ezrg53lt5tzmygkycv4

Solving Zero-Sum Games through Alternating Projections [article]

Ioannis Anagnostides, Paolo Penna
2021 arXiv   pre-print
First, we provide a precise analysis of Optimistic Gradient Descent/Ascent (OGDA) – an optimistic variant of Gradient Descent/Ascent – for unconstrained bilinear games, extending and strengthening prior  ...  In this work, we establish near-linear and strong convergence for a natural first-order iterative algorithm that simulates Von Neumann's Alternating Projections method in zero-sum games.  ...  Optimistic Gradient Descent/Ascent Let us focus on the unconstrained case, i.e. X = R n and Y = R m .  ... 
arXiv:2010.00109v2 fatcat:s4g7fdc6tveirb7gxwp7nmagq4

Generalized Boosting Algorithms for Convex Optimization [article]

Alexander Grubb, J. Andrew Bagnell
2012 arXiv   pre-print
., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance  ...  We present the weak to strong learning guarantees for the existing gradient boosting work for strongly-smooth, strongly-convex objectives under this new measure of performance, and also demonstrate that  ...  Acknowledgements We would like to thank Kevin Waugh, Daniel Munoz and the ICML reviewers for their helpful feedback.  ... 
arXiv:1105.2054v2 fatcat:562cpdkqxbbqncdnh42qxwxk2m

Generative Adversarial Networks (GANs): What it can generate and What it cannot? [article]

P Manisha, Sujit Gujar
2019 arXiv   pre-print
We compare and contrast different results and put forth a summary of theoretical contributions about GANs with focus on image/visual applications.  ...  With a straightforward implementation and outstanding results, GANs have been used for numerous applications. Despite the success, GANs lack a proper theoretical explanation.  ...  They make a connection between no regret algorithms and alternating SGD and prove it's convergence in convex-concave case.  ... 
arXiv:1804.00140v2 fatcat:4l64cjgtenhl7dipkjz3ssygdy

Online Kernel based Generative Adversarial Networks [article]

Yeojoon Youn, Neil Thistlethwaite, Sang Keun Choe, Jacob Abernethy
2020 arXiv   pre-print
We show empirically that OKGANs mitigate a number of training issues, including mode collapse and cycling, and are much more amenable to theoretical guarantees.  ...  OKGANs empirically perform dramatically better, with respect to reverse KL-divergence, than other GAN formulations on synthetic data; on classical vision datasets such as MNIST, SVHN, and CelebA, show  ...  The standard protocol for GAN training is to find an equilibrium of (1) by alternately updating θ d and θ g via stochastic gradient descent/ascent using samples drawn from both the true distribution (dataset  ... 
arXiv:2006.11432v1 fatcat:ut47yfv7bbht7b2qnvq4nmgnoa

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [article]

Kaiqing Zhang, Zhuoran Yang, Tamer Başar
2021 arXiv   pre-print
., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively  ...  Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in  ...  The weights of the policy network is trained via stochastic gradient ascent to maximize the likelihood function.  ... 
arXiv:1911.10635v2 fatcat:ihlhtjlhnrdizbkcfzsnz5urfq

Convergence of Strategies in Simple Co-Adapting Games

Richard Mealing, Jonathan L. Shapiro
2015 Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII - FOGA '15  
This work looks at simple simultaneous-move games with two or three actions and two or three players.  ...  The convergence of sequence prediction is improved by combining it with fictitious play.  ...  Bowling and Veloso proposed the Win or Learn Fast (WoLF) principle for varying the learning rate, or step-size in the case of gradient ascent, to improve this convergence [9] .  ... 
doi:10.1145/2725494.2725503 dblp:conf/foga/MealingS15 fatcat:twugekm45rafvjowkfxzgcybgq

Improper Reinforcement Learning with Gradient-based Policy Optimization [article]

Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor
2021 arXiv   pre-print
gradient descent optimization.  ...  The value function of the mixture and its gradient may not be available in closed-form; however, we show that we can employ rollouts and simultaneous perturbation stochastic approximation (SPSA) for explicit  ...  The parameters are updated via a gradient descent step based on the derivative of the value function evaluated with the current parameters θ t .  ... 
arXiv:2102.08201v3 fatcat:ycfwh2cdpfcafmryqai4uh33pu

Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting [article]

Bolin Gao, Lacra Pavel
2022 arXiv   pre-print
We show that MD2 enjoys no-regret as well as an exponential rate of convergence towards a strong VSS upon a slight modification.  ...  Lastly, using stochastic approximation techniques, we provide a convergence guarantee of discrete-time MD2 with noisy observations towards interior mere VSS.  ...  algorithms such as optimistic gradient-descent/ascent, Polyak's heavy-ball method, among others (see [26] ).  ... 
arXiv:2111.09982v2 fatcat:qjqh2xkma5a5bo43ia4p7g7fp4

Differentiable Game Mechanics [article]

Alistair Letcher and David Balduzzi and Sebastien Racaniere and James Martens and Jakob Foerster and Karl Tuyls and Thore Graepel
2019 arXiv   pre-print
Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs -- while at the same time being applicable to, and having guarantees in, much more general  ...  Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima.  ...  We thank Guillaume Desjardins and Csaba Szepesvari for useful comments.  ... 
arXiv:1905.04926v1 fatcat:lao2jsl7f5ewffknywz3qegd5q

Independent Policy Gradient Methods for Competitive Reinforcement Learning [article]

Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
2021 arXiv   pre-print
We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state.  ...  To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated  ...  Such a guarantee goes beyond what is achieved by gradient-descent-ascent (GDA) with equal learning rates: Even for zero-sum matrix games, the iterates of GDA can reach limit cycles that remain a constant  ... 
arXiv:2101.04233v1 fatcat:l4w55uogljbspcg4fkl4fd7ls4

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [article]

Tianyi Lin, Zhengyuan Zhou, Wenjia Ba, Jiawei Zhang
2022 arXiv   pre-print
gradient.  ...  We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein.  ...  With Algorithm 1 Mirror Descent Self-Concordant Barrier Bandit Learning 1: Input: step size η t > 0, module β > 0 and barrier R : int(X ) → R . 2: Initialization: x 1 = arg min x∈X R(x). 3: for t = 1,  ... 
arXiv:2112.02856v3 fatcat:y7xamqsqlrefvjsqolngd5g4lm

Distributed Mirror-Prox Optimization for Multi-Access Edge Computing

Zhenfeng Sun, Mohammad Reza Nakhai
2020 IEEE Transactions on Communications  
The simulation results confirm the superiority of the proposed DMP algorithm over the stochastic dual gradient in terms of delay minimization, dynamic regret, aggregate violation and energy efficiency  ...  Both, the cost and the constraint functions are time-varying with unknown statistics.  ...  step size.  ... 
doi:10.1109/tcomm.2020.2973566 fatcat:5zadqtzr25d6fjdsa32w3hmsqi
« Previous Showing results 1 — 15 out of 72 results