41,258 Hits in 8.1 sec

On the Importance of Consistency in Training Deep Neural Networks [article]

Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos
2017 arXiv   pre-print
Thus, an important design principle for future optimization and neural network design is derived. We conclude this paper with the construction of a novel contractive neural network.  ...  The second issue is the scale inconsistency between the layer inputs and the layer residuals. We explain how second-order information provides favorable convenience in removing this roadblock.  ...  Conclusion We made the observation that the long standing challenge of training deep artificial neural network is caused by a syndrome of three inconsistency problems.  ... 
arXiv:1708.00631v1 fatcat:bzs7lwfvjjdercfretoq4wgudm

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Jingcheng Zhou, Wei Wei, Ruizhi Zhang, Zhiming Zheng
2021 Mathematics  
First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training  ...  Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information.  ...  Acknowledgments: We thank Zhenyu Shi for their detailed guidance on the paper layout. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/math9131533 fatcat:kqu72qme6jhm3dcehhfa2a4czu

addHessian: Combining quasi-Newton method with first-order method for neural network training

Sota Yasuda, S. Indrapriyadarsini, Hiroshi Ninomiya, Takeshi Kamio, Hideki Asai
2022 Nonlinear Theory and Its Applications IEICE  
In this paper, we propose a new learning algorithm for training neural networks by combining first-order and second-order methods.  ...  First-order methods such as SGD and Adam are popularly used in training Neural networks.  ...  In this paper, we propose a new learning algorithm for training NNs by combining first-order and second-order methods.  ... 
doi:10.1587/nolta.13.361 fatcat:wzdhkiepz5adjmuwxexdqpo4oy

First-Order Optimization (Training) Algorithms in Deep Learning

Oleg Rudenko, Oleksandr Bezsonov, Kyrylo Oliinyk
2020 International Conference on Computational Linguistics and Intelligent Systems  
Studies show that for this task a simple gradient descent algorithm is quite effective.  ...  The most widely used optimization method in deep learning is the first-order algorithm that based on gradient descent (GD).  ...  This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.  ... 
dblp:conf/colins/RudenkoBO20 fatcat:urkwrrkkq5fqvjcrxkmto4move

A State-of-the-art Survey of Advanced Optimization Methods in Machine Learning

Muhamet Kastrati, Marenglen Biba
2021 International Conference on Recent Trends and Applications in Computer Science and Information Technology  
Then optimization is presented along with a review of the most recent state-of-the-art methods and algorithms that are being extensively used in machine learning in general and deep neural networks in  ...  The paper concludes with some general recommendations for future work in the area.  ...  We hope that the issues discussed in this paper will push forward the discussion in the area of optimization and machine learning, on the same time it may serve as complementary material for other researchers  ... 
dblp:conf/rtacsit/KastratiB21 fatcat:gcvz6va2wrdgvcorfb52qdac4q

Quantitative Investment Based on Artificial Neural Network Algorithm

Xia Zhang
2015 International Journal of u- and e- Service, Science and Technology  
Artificial neural network is a new high-tech research area Since 1980s, involving a variety of disciplines, attracting many neurophysiologists, psychologists, mathematician, computer and information scientists  ...  the combination of robot control.  ...  For large-scale data, the gradient descent algorithm iterative speed will be very slow. This can improve the speed of optimization operation using stochastic gradient descent method.  ... 
doi:10.14257/ijunesst.2015.8.7.04 fatcat:lbi4k77lgngjjn3znocsflufiy

Adaptive Natural Gradient Method for Learning of Stochastic Neural Networks in Mini-Batch Mode

Hyeyoung Park, Kwanyong Lee
2019 Applied Sciences  
Gradient descent method is an essential algorithm for learning of neural networks.  ...  Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic  ...  Gradient Descent Learning of Stochastic Neural Networks Stochastic Neural Networks Since the natural gradient is derived from stochastic neural network models, let us start from the brief description  ... 
doi:10.3390/app9214568 fatcat:m5aeepltwvdgdeklizddxycmm4

Second-order Information in First-order Optimization Methods [article]

Yuzheng Hu and Licong Lin and Shange Tang
2019 arXiv   pre-print
For adaptive methods, we related Adam and Adagrad to a powerful technique in computation statistics---Natural Gradient Descent.  ...  For Nesterov Accelerated Gradient, we rigorously prove that the algorithm makes use of the difference between past and current gradients, thus approximates the Hessian and accelerates the training.  ...  They want to thank Huiyuan Wang for his helpful discussion and Putian Li for his suggestion in typesetting.  ... 
arXiv:1912.09926v1 fatcat:a2ejpz3klncjlc6vnju4n4eeoa

Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach [article]

Malena Reiners and Kathrin Klamroth and Michael Stiglmayr
2020 arXiv   pre-print
On the other hand we implement stochastic multi-gradient descent algorithms that generate a single Pareto optimal solution without requiring or using preference information.  ...  We suggest a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions in a biobjective optimization  ...  This work has been partially supported by EFRE (European fund for regional development) project EFRE-0400216.  ... 
arXiv:2008.13590v1 fatcat:6yaagh7adbhrdlsw53t67uajxu

Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent

Nicol N. Schraudolph
2002 Neural Computation  
We propose a generic method for iteratively approximating various second-order gradient steps--Newton, Gauss-Newton, Levenberg-Marquardt, and natural gradient--in linear time per iteration, using special  ...  Two recent acceleration techniques for on-line learning, matrix momentum and stochastic meta-descent (SMD), implement this approach.  ...  Acknowledgments I thank Jenny Orr, Barak Pearlmutter, and the anonymous reviewers for their helpful suggestions, and the Swiss National Science Foundation for the nancial support provided under grant number  ... 
doi:10.1162/08997660260028683 pmid:12079553 fatcat:fivkl72emjf6rg6q6ex6vfpohq

Stochastic Variance Reduction for Deep Q-learning [article]

Wei-Ye Zhao, Xi-Ya Guan, Yang Liu, Xiaoming Zhao, Jian Peng
2019 arXiv   pre-print
In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques.  ...  However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable training and poor sample efficiency.  ...  However, second-order methods are infeasible in practice for high-dimensional training, such as neural network. 0.0 0.2 0.4 0.6 0.8 1.0 Training Iterations 1e7 0.0% 20.0% 40.0% 60.0%  ... 
arXiv:1905.08152v1 fatcat:dvmbg6ly5vaenl3badbojsm7xq

Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy [article]

Sang-Woo Lee, Min-Oh Heo, Jiwon Kim, Jeonghee Kim, Byoung-Tak Zhang
2015 arXiv   pre-print
The online learning of deep neural networks is an interesting problem of machine learning because, for example, major IT companies want to manage the information of the massive data uploaded on the web  ...  During the training phase, we use various online, incremental ensemble, and transfer learning techniques in order to achieve lower error of the architecture.  ...  In the mini-batch-shift gradient descent ensemble, a general neural network C trained by mini-batchshift gradient descent is transferred to each weak neural network W (Algorithm 2) and the ensemble of  ... 
arXiv:1506.04477v1 fatcat:ryij5slnsjhi7acauu3bhnfkry

Recurrent neural network training with preconditioned stochastic gradient descent [article]

Xi-Lin Li
2016 arXiv   pre-print
This paper studies the performance of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on recurrent neural network (RNN) training.  ...  PSGD adaptively estimates a preconditioner to accelerate gradient descent, and is designed to be simple, general and easy to use, as stochastic gradient descent (SGD).  ...  It is a simple and general procedure to upgrade a stochastic gradient descent (SGD) algorithm to a second-order algorithm by exploiting the curvature information extracted exclusively from noisy stochastic  ... 
arXiv:1606.04449v2 fatcat:r3r66yomynfmrgk5pdba3ytemm

Deep Learning Architectures, Algorithms for Speech Recognition: An Overview

Banumathi A .C, E. Chandra
2017 International Journal of Advanced Research in Computer Science and Software Engineering  
Our paper brings a study of the different classifiers of Neural networks like Recurrent Neural  ...  Speech is an easy mode of communication for the people to interact with the computer, rather than using keyboard and mouse.  ...  The standard training algorithm for deep neural networks (DNNs) is stochastic gradient descent (SGD). Typically, the stochastic gradient is computed on mini-batches.  ... 
doi:10.23956/ijarcsse/v7i1/0107 fatcat:nokgnwfwabf65iolk3t6qt3g74

Inverse mapping of face GANs [article]

Nicky Bayat, Vahid Reza Khazaie, Yalda Mohsenzadeh
2020 arXiv   pre-print
We train a ResNet architecture to recover a latent vector for a given face that can be used to generate a face nearly identical to the target.  ...  While many studies have explored various training configurations and architectures for GANs, the problem of inverting a generative model to extract latent vectors of given input images has been inadequately  ...  Method Overview In this work, we train a residual neural network (ResNet18) in order to map an input image to its corresponding latent vector using a combination of a reconstruction loss and a perceptual  ... 
arXiv:2009.05671v1 fatcat:h744nq724zblflol6pk4jeatme
« Previous Showing results 1 — 15 out of 41,258 results