1,286 Hits in 3.4 sec

Equilibrated adaptive learning rates for non-convex optimization [article]

Yann N. Dauphin, Harm de Vries, Yoshua Bengio
2015 arXiv   pre-print
We introduce a novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner.  ...  preconditioner is comparatively better suited to non-convex problems.  ...  Equilibrated learning rates are well suited to non-convex problems In this section, we demonstrate that equilibrated learning rates are well suited to non-convex optimization, particularly compared to  ... 
arXiv:1502.04390v2 fatcat:fyicqskqxrbxzivryvpcljtypm

Optimizers in Deep Learning: An Imperative Study and Analysis [chapter]

Ajeet K. Jain, PVRD Prasad Rao, K. Venkatesh Sharma
2021 Annual Proceedings of the Science & Technology Metrics  
The optimization approaches in deep learning have broad applicability with the resurgence of novelty starting from Stochastic Gradient Descent to convex and non-convex and derivative-free methods.  ...  Machine learning has enormously contributed to optimization techniques motivating new approaches for optimization algorithms.  ...  At times it also requires hyper-parameter tuning and different learning rates adaptively.  ... 
doi:10.6025/stm/2021/3/99-106 fatcat:s7yuzddwr5hl5crhfbaelapgb4

Adaptive transmit policies for cost-efficient power allocation in multi-carrier systems

Salvatore D'Oro, Panayotis Mertikopoulos, Aris L. Moustakas, Sergio Palazzo
2014 2014 12th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)  
In particular, we consider a set of wireless users who seek to maximize their transmission rate subject to pricing limitations and we show that the resulting non-cooperative game admits a unique equilibrium  ...  for almost every realization of the system's channels.  ...  Evolution of the equilibration rate for different pricing models and network configurations as a function of the step-size γ n . γ n are used in the learning process.  ... 
doi:10.1109/wiopt.2014.6850271 dblp:conf/wiopt/DOroMMP14 fatcat:dtaisrhssvbgdadu6cjlfztdaq

Online Second Order Methods for Non-Convex Stochastic Optimizations [article]

Xi-Lin Li
2018 arXiv   pre-print
This paper proposes a family of online second order methods for possibly non-convex stochastic optimizations based on the theory of preconditioned stochastic gradient descent (PSGD), which can be regarded  ...  as an enhance stochastic Newton method with the ability to handle gradient noise and non-convexity simultaneously.  ...  ., SGD with either classic or Nesterov momentum, RMSProp, Adam, adaptive learning rates, etc., are popular in diverse stochastic optimization problems, e.g., machine learning and adaptive signal processing  ... 
arXiv:1803.09383v3 fatcat:j3bcbgerwrco5agmjprgtv26ma

How to decay your learning rate [article]

Aitor Lewkowycz
2021 arXiv   pre-print
Complex learning rate schedules have become an integral part of deep learning. We find empirically that common fine-tuned schedules decay the learning rate after the weight norm bounces.  ...  This leads to the proposal of ABEL: an automatic scheduler which decays the learning rate by keeping track of the weight norm.  ...  Acknowledgments The authors would like to thank Anders Andreassen, Yasaman Bahri, Ethan Dyer, Orhan Firat, Pierre Foret, Guy Gur-Ari, Jaehoon Lee, Behnam Neyshabur and Vinay Ramasesh for useful discussions  ... 
arXiv:2103.12682v1 fatcat:z4t67gk7yffznikrkyjzabpsee

Robust and efficient algorithms for high-dimensional black-box quantum optimization [article]

Zhaoqi Leng and Pranav Mundada and Saeed Ghadimi and Andrew Houck
2019 arXiv   pre-print
We prove the asymptotic convergence of our algorithms in a convex setting, and we benchmark them against other gradient-based optimization algorithms on non-convex optimal control tasks.  ...  Hybrid quantum-classical optimization using near-term quantum technology is an emerging direction for exploring quantum advantage in high-dimensional systems.  ...  The authors would like to acknowledge Christie Chiu, András Gyenis, Anjali Premkumar, Basil Smitham and Sara Sussman for valuable comments on the manuscript.  ... 
arXiv:1910.03591v2 fatcat:uay7kcrukjcy3eyw33t5zmasci

Distributed Learning Policies for Power Allocation in Multiple Access Channels

P. Mertikopoulos, E. V. Belmega, A. L. Moustakas, S. Lasaulce
2012 IEEE Journal on Selected Areas in Communications  
We analyze the power allocation problem for orthogonal multiple access channels by means of a non-cooperative potential game in which each user distributes his power over the channels available to him.  ...  Our theoretical analysis hinges on a novel result which is of independent interest: in finite-player games which admit a (possibly nonlinear) convex potential, the replicator dynamics converge to an ε-neighborhood  ...  The complementary inequality p A (n + 1) ≤ 1 then follows similarly, so, with p(n) ∈ ∆ for all n, our theorem follows from Theorem 2 and Corollary 4 in Chap. 2 of [21] .  ... 
doi:10.1109/jsac.2012.120109 fatcat:t35kz5nihjcrdemwhiuhpjerta

A Free-Energy Principle for Representation Learning [article]

Yansong Gao, Pratik Chaudhari
2020 arXiv   pre-print
This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learnt representations for transfer learning.  ...  We discuss how information-theoretic functional such as rate, distortion and classification loss of a model lie on a convex, so-called equilibrium surface.We prescribe dynamical processes to traverse this  ...  We change λ, γ with respect to time t and then apply the equilibration learning rate schedule of Fig. 4a to achieve the transition between equilibrium states.  ... 
arXiv:2002.12406v1 fatcat:r722paickffudophnmr2t2a5um

Optimal learning rate schedules in high-dimensional non-convex optimization problems [article]

Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli
2022 arXiv   pre-print
In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion β=1 to converge rapidly to the  ...  However, in many realistic problems the loss-landscape is high-dimensional and non convex – a case for which results are scarce.  ...  Acknowledgements We thank Stefano Sarao Mannelli and Francis Bach for illuminating discussions.  ... 
arXiv:2202.04509v1 fatcat:cgksf63x6ba5jbrmw2bloymkgy

Performative Prediction [article]

Juan C. Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, Moritz Hardt
2021 arXiv   pre-print
Performativity is a well-studied phenomenon in policy-making that has so far been neglected in supervised learning.  ...  We thus also give the first sufficient conditions for retraining to overcome strategic feedback effects.  ...  Learning on non-stationary distributions.  ... 
arXiv:2002.06673v4 fatcat:t2hwimraqbb35pomikhx46rsga

Preconditioned Spectral Descent for Deep Learning

David E. Carlson, Edo Collins, Ya-Ping Hsieh, Lawrence Carin, Volkan Cevher
2015 Neural Information Processing Systems  
These challenges include, but are not limited to, the non-convexity of learning objectives and estimating the quantities needed for optimization algorithms, such as gradients.  ...  While we do not address the non-convexity, we present an optimization solution that exploits the so far unused "geometry" in the objective function in order to best make use of the estimated gradients.  ...  We thank the reviewers for their helpful comments.  ... 
dblp:conf/nips/CarlsonCHCC15 fatcat:siejlhcj3jfulgze6yzmtqyho4

Learning in the machine: Recirculation is random backpropagation

P Baldi, P Sadowski
2018 Neural Networks  
Optimal learning in deep neural architectures requires that non-local information be available to the deep synapses.  ...  Thus, in general, optimal learning in physical neural systems requires the presence of a deep learning channel to communicate non-local information to deep synapses, in a direction opposite to the forward  ...  The same learning rate of 0.01 is used in each algorithm, except for REC´, which uses a smaller learning rate in the lower layers: 0.01 2 n for the n-th hidden layer away from the output (n ϵ [0,10]).  ... 
doi:10.1016/j.neunet.2018.09.006 pmid:30317133 pmcid:PMC6246787 fatcat:jqqvwpsphnaavhoapit37vikvu

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic [article]

Matteo Sordello, Hangfeng He, Weijie Su
2020 arXiv   pre-print
This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization.  ...  This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is detected, that is, the iterates are likely to bounce at around  ...  Equilibrated adaptive learning rates for non-convex optimization. In Advances in neural information processing systems, pages 1504-1512. De, S., Yadav, A., Jacobs, D., and Goldstein, T.  ... 
arXiv:1910.08597v4 fatcat:6qneixj6sbf4xljpgs7c5pwyie

Online Monotone Optimization [article]

Ian Gemp, Sridhar Mahadevan
2016 arXiv   pre-print
The proposed framework generalizes the popular online convex optimization framework and extends it to its natural limit allowing it to capture a notion of regret that is intuitive for more general problems  ...  This paper presents a new framework for analyzing and designing no-regret algorithms for dynamic (possibly adversarial) systems.  ...  Online convex optimization (OCO) is a framework for studying the online learning problem when losses are convex with respect to the prediction domain which is also a convex set.  ... 
arXiv:1608.07888v1 fatcat:td4feopzgbgzvapvwwj5jelpmu

A biologically plausible neural network for multi-channel Canonical Correlation Analysis [article]

David Lipshutz, Yanis Bahroun, Siavash Golkar, Anirvan M. Sengupta, Dmitri B. Chklovskii
2021 arXiv   pre-print
local non-Hebbian learning rules.  ...  We also derive an extension of our online CCA algorithm with adaptive output rank and output whitening.  ...  Acknowledgements We thank Nati Srebro for drawing our attention to CCA, and we thank Tiberiu Tesileanu and Charles Windolf for their helpful feedback on an earlier draft of this manuscript.  ... 
arXiv:2010.00525v4 fatcat:ttpnnb7kfnablmldvfh3wxs4t4
« Previous Showing results 1 — 15 out of 1,286 results