367 Hits in 6.6 sec

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems [article]

Preetum Nakkiran
2020 arXiv   pre-print
We give a toy convex problem where learning rate annealing (large initial learning rate, followed by small learning rate) can lead gradient descent to minima with provably better generalization than using  ...  In this note, we show that this phenomenon can exist even for convex learning problems -- in particular, linear regression in 2 dimensions.  ...  Acknowledgements We thank John Schulman for a discussion around learning rates that led to wondering if this can occur in convex problems.  ... 
arXiv:2005.07360v1 fatcat:plkqqrko6rerjerfavquoqkfpu

Prospects and challenges of quantum finance [article]

Adam Bouland, Wim van Dam, Hamed Joorati, Iordanis Kerenidis, Anupam Prakash
2020 arXiv   pre-print
as quantum annealing heuristics for portfolio optimization.  ...  We consider quantum speedups for Monte Carlo methods, portfolio optimization, and machine learning.  ...  Acknowledgments We thank Kay Giesecke, Rajiv Krishnakumar, Ashley Montanaro, Nikitas Stamatopoulos, and Will Zeng for helpful discussions and comments on this manuscript.  ... 
arXiv:2011.06492v1 fatcat:mqzj2a2pzzaz5pdcllxgkr73oq

Adaptive Gradient Methods with Local Guarantees [article]

Zhou Lu, Wenhan Xia, Sanjeev Arora, Elad Hazan
2022 arXiv   pre-print
In this paper we study the problem of learning a local preconditioner, that can change as the data is changing along the optimization trajectory.  ...  Without the need to manually tune a learning rate schedule, our method can, in a single run, achieve comparable and stable task accuracy as a fine-tuned optimizer.  ...  We demonstrate the effectiveness and robustness of SAMUEL in experiments, where we show that SAMUEL can automatically adapt to the optimal learning rate and achieve comparable task accuracy as a fine-tuned  ... 
arXiv:2203.01400v2 fatcat:5wg5u5ctuja37lqmfibh34ybh4

Message-passing for graph-structured linear programs

Pradeep Ravikumar, Alekh Agarwal, Martin J. Wainwright
2008 Proceedings of the 25th international conference on Machine learning - ICML '08  
A large body of past work has focused on the first-order tree-based LP relaxation for the MAP problem in Markov random fields.  ...  We establish various convergence guarantees for our algorithms, illustrate their performance, and also present rounding schemes with provable optimality guarantees.  ...  We thank the anonymous reviewers for helpful comments.  ... 
doi:10.1145/1390156.1390257 dblp:conf/icml/RavikumarAW08 fatcat:gzalmwvudzdvvdk4kleb5gm4zm

A Survey of Quantum Computing for Finance [article]

Dylan Herman, Cody Googin, Xiaoyuan Liu, Alexey Galda, Ilya Safro, Yue Sun, Marco Pistoia, Yuri Alexeev
2022 arXiv   pre-print
learning, showing how these solutions, adapted to work on a quantum computer, can help solve more efficiently and accurately problems such as derivative pricing, risk analysis, portfolio optimization,  ...  We hope this article will not only serve as a reference for academic researchers and industry practitioners but also inspire new ideas for future research.  ...  In general, however, the speedup for each task can vary greatly or may even be currently unknown (Section 4).  ... 
arXiv:2201.02773v3 fatcat:aqcl6blbyvbljg627ot6zxtaj4

Construction of non-convex polynomial loss functions for training a binary classifier with quantum annealing [article]

Ryan Babbush, Vasil Denchev, Nan Ding, Sergei Isakov, Hartmut Neven
2014 arXiv   pre-print
These loss functions may also be useful for classical approaches as they compile to regularized risk expressions which can be evaluated in constant time with respect to the number of training examples.  ...  To take advantage of a potential quantum advantage, one needs to be able to map the problem of interest to the native hardware with reasonably low overhead.  ...  However, there is evidence that quantum resources such as tunneling and entanglement are generic computational resources which may help to solve problem instances which would be otherwise intractable for  ... 
arXiv:1406.4203v1 fatcat:nmri7mwwp5asrnymdmdpurdd24

Efficient Full-Matrix Adaptive Regularization [article]

Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang
2020 arXiv   pre-print
We also provide a novel theoretical analysis for adaptive regularization in non-convex optimization settings.  ...  Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive.  ...  Acknowledgments We are grateful to Yoram Singer, Tomer Koren, Nadav Cohen, and Sanjeev Arora for helpful discussions.  ... 
arXiv:1806.02958v2 fatcat:vyzeqvt7bbedrn2tyfdjhsat5a

Generalization and Exploration via Randomized Value Functions [article]

Ian Osband, Benjamin Van Roy, Zheng Wen
2016 arXiv   pre-print
generalization.  ...  We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.  ...  A recommendation engine We will now show that efficient exploration and generalization can be helpful in a simple model of customer interaction.  ... 
arXiv:1402.0635v3 fatcat:aoqndidaz5gnbkg65punxvf4ge

Scaling-up Distributed Processing of Data Streams for Machine Learning [article]

Matthew Nokleby, Haroon Raja, Waheed U. Bajwa
2020 arXiv   pre-print
Further, it reviews guarantees underlying these methods, which show there exist regimes in which systems can learn from distributed, streaming data at order-optimal rates.  ...  In particular, it focuses on methods that solve: (i) distributed stochastic convex problems, and (ii) distributed principal component analysis, which is a nonconvex problem with geometric structure that  ...  (Note that some of these methods have provable convergence issues, even for convex problems [79] .)  ... 
arXiv:2005.08854v2 fatcat:y6fvajvq2naajeqs6lo3trrgwy

Quantum Computing at the Frontiers of Biological Sciences [article]

Prashant S. Emani, Jonathan Warrell, Alan Anticevic, Stefan Bekiranov, Michael Gandal, Michael J. McConnell, Guillermo Sapiro, Alán Aspuru-Guzik, Justin Baker, Matteo Bastiani, Patrick McClure, John Murray, Stamatios N Sotiropoulos, Jacob Taylor (+3 others)
2019 arXiv   pre-print
However, challenges arise as we push the limits of scale and complexity in biological problems.  ...  view towards quantum computation and quantum information science, where algorithms have demonstrated potential polynomial and exponential computational speedups in certain applications, such as machine learning  ...  The former may be a candidate for an exact quantum solution for small-scale problems, while both may benefit from approximate quantum annealing approaches (an annealing-based approach to NMF is found in  ... 
arXiv:1911.07127v1 fatcat:k2agx5yysjgi3m3ryhicptzauq

Quantum machine learning: a classical perspective

Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo, Massimiliano Pontil, Andrea Rocchetto, Simone Severini, Leonard Wossnig
2018 Proceedings of the Royal Society A  
learning problems.  ...  Learning in the presence of noise and certain computationally hard problems in machine learning are identified as promising directions for the field.  ...  Acknowledgements We thank Scott Aaronson, David Barber, Marcello Benedetti, Fernando Brandão, Dan Brown, Carlos González-Guillén, Joshua Lockhart, and Alessandro Rudi for helpful comments on the manuscript  ... 
doi:10.1098/rspa.2017.0551 pmid:29434508 pmcid:PMC5806018 fatcat:zlfvny7iyzb47di2cndbvvrglu

The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima? [article]

Nikolas Kantas, Panos Parpas, Grigorios A. Pavliotis
2019 arXiv   pre-print
An open problem in machine learning is whether flat minima generalize better and how to compute such minima efficiently. This is a very challenging problem.  ...  Our primary focus is on the design of algorithms for machine learning applications; however the underlying mathematical framework is suitable for the understanding of large scale systems of agent based  ...  If instead β is gradually increased using a so called annealing schedule, then adding noise to the normal gradient flow can help the dynamics in (1) provably converge to a global minimum of Φ (Geman and  ... 
arXiv:1905.04121v1 fatcat:yq7i6o3ok5dvfbykaed5ycwr4q

Training verified learners with learned verifiers [article]

Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli
2018 arXiv   pre-print
., networks that provably satisfy some desired input-output properties.  ...  also be scaled to produce the first known (to the best of our knowledge) verifiably robust networks for CIFAR-10.  ...  Instead, PVT exploits the idea that the solution of this optimization problem can be learned, i.e., the mapping from a nominal training example to the optimal dual variables can be learned by the verifier  ... 
arXiv:1805.10265v2 fatcat:ratq2s4kdjh3jhesxt4w7444qe

A NASA Perspective on Quantum Computing: Opportunities and Challenges [article]

Rupak Biswas, Zhang Jiang, Kostya Kechezhi, Sergey Knysh, Salvatore Mandrà, Bryan O'Gorman, Alejandro Perdomo-Ortiz, Andre Petukhov, John Realpe-Gómez, Eleanor Rieffel, Davide Venturelli, Fedir Vasko (+1 others)
2017 arXiv   pre-print
For most problems, however, it is currently unknown whether quantum algorithms can provide an advantage, and if so by how much, or how to design quantum algorithms that realize such advantages.  ...  In the last couple of decades, the world has seen several stunning instances of quantum algorithms that provably outperform the best classical algorithms.  ...  that can serve as a building block for deep learning architectures.  ... 
arXiv:1704.04836v1 fatcat:7itanvx3mzgrnfouu5wsx33v7i

Gradient Descent, Stochastic Optimization, and Other Tales [article]

Jun Lu
2022 arXiv   pre-print
Its stochastic version receives attention in recent years, and this is particularly true for optimizing deep neural networks.  ...  Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize machine learning tasks.  ...  The toy example shows learning rate annealing schemes in general can help optimization methods "find" better local minima with better performance.  ... 
arXiv:2205.00832v1 fatcat:unridtvvi5b2jf6xbu2chlw7ce
« Previous Showing results 1 — 15 out of 367 results