6,316 Hits in 10.1 sec

How Does Batch Normalization Help Optimization? [article]

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, Aleksander Madry
2019 arXiv   pre-print
The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift".  ...  Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs).  ...  Acknowledgements We thank Ali Rahimi and Ben Recht for helpful comments on a preliminary version of this paper.  ... 
arXiv:1805.11604v5 fatcat:zj6ybdoo3rdbldv2idlshwwzj4

Optimal Classification of COVID-19: A Transfer Learning Approach

Aditya Kakde, Durgansh Sharma, Nitin Arora
2020 International Journal of Computer Applications  
Analyzing and then diagnosing is currently a major challenge. This paper focuses on the classification which can help in analysis of COVID-19 with normal chest X-ray using deep learning technique.  ...  This is a viral pneumonia and thus no antiviral drug will work to reduce these cases. During the recovery, only immune system has played a major role.  ...  Covariate shift in which a novel method named batch normalization was proposed.  ... 
doi:10.5120/ijca2020920165 fatcat:imxgwadepfchljearconybt6mm

On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing [article]

M. U. B. Dias, D. D. N. De Silva, S. Fernando
2018 arXiv   pre-print
Further, it proposes some insights for optimizing deep neural networks using evolutionary computing techniques.  ...  Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent  ...  This is known as internal covariant shift.  ... 
arXiv:1808.01766v1 fatcat:eb6vugsg6nbx5crdme5odyt7iq

Data optimization for large batch distributed training of deep neural networks [article]

Shubhankar Gahlot, Junqi Yin, Mallikarjun Shankar
2020 arXiv   pre-print
We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape  ...  Distributed training in deep learning (DL) is common practice as data and models grow.  ...  Briefly, batch normalization (BN) addresses the problem of internal covariate shift in the neural network by reducing the dependency of the distribution of the input activations of each layer on all the  ... 
arXiv:2012.09272v2 fatcat:hrgrznaeefefpkesxlgvuiycou

ConFusion: Sensor Fusion for Complex Robotic Systems using Nonlinear Optimization

Timothy Sandy, Lukas Stadelmann, Simon Kerscher, Jonas Buchli
2019 IEEE Robotics and Automation Letters  
ConFusion is a modular framework for fusing measurements from many heterogeneous sensors within a moving horizon estimator.  ...  We demonstrate its performance in comparison to an iterated extended Kalman filter in visual-inertial tracking, and show its versatility through whole-body sensor fusion on a mobile manipulator.  ...  It also does not provide the full flexibility in sensing system design offered by MHEs because it runs an EKF at the front of the optimized batch of states to perform marginalization.  ... 
doi:10.1109/lra.2019.2894168 fatcat:2uwaihj6jndrhcww4peev645xy

Deep Reinforcement Learning for Stock Portfolio Optimization

Le Trung Hieu, National University of Singapore, Singapore
2020 International Journal of Modeling and Optimization  
Stock portfolio optimization is the process of constant re-distribution of money to a pool of various stocks.  ...  Observations and hypothesis were discussed about the results, as well as possible future research directions.1  ...  (Note that A is the advantage value, which is defined as International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020 ( , ) = ( , ) − ( ) , which shows how good an action is compared  ... 
doi:10.7763/ijmo.2020.v10.761 fatcat:hxwp6agp7vaz7de5vtijjeabdu

On Ensembles, I-Optimality, and Active Learning

William D Heavlin
2021 Journal of Statistical Theory and Practice  
We concentrate on the large batch case, because this is most aligned with most machine learning applications, and because it is more theoretically rich.  ...  We illustrate by fitting a deep neural network to about 20 percent of the CIFAR-10 image dataset. The statistical efficiency we achieve is $$3\times$$ 3 × random selection.  ...  If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly  ... 
doi:10.1007/s42519-021-00200-4 fatcat:hpxzvnysnjctxbikmicfdzgfuy

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate [article]

Zhiyuan Li, Kaifeng Lyu, Sanjeev Arora
2020 arXiv   pre-print
We name it the Fast Equilibrium Conjecture and suggest it holds the key to why Batch Normalization is effective.  ...  ., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g.,  ...  While the original motivation is to reduce Internal Covariate Shift (ICS), (Santurkar et al., 2018) challenged this view and argued that the effectiveness of BN comes from a smoothening effect on the  ... 
arXiv:2010.02916v1 fatcat:b3idmkfg4zdpvnwx6qforoyxcm

Model Selection in Batch Policy Optimization [article]

Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai
2021 arXiv   pre-print
In contrast, the third source is unique to batch policy optimization and is due to dataset shift inherent to the setting.  ...  We first show that no batch policy optimization algorithm can achieve a guarantee addressing all three simultaneously, revealing a stark contrast between difficulties in batch policy optimization and the  ...  It is clear that a need exists in batch policy optimization for an analogue to methods like cross-validation in supervised learning.  ... 
arXiv:2112.12320v1 fatcat:thrtvpivf5gy5h3i7cff5kzmvu

Optimization for deep learning: theory and algorithms [article]

Ruoyu Sun
2019 arXiv   pre-print
This article provides an overview of optimization algorithms and theory for training neural networks.  ...  First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods  ...  We also thank Ju Sun for the list of related works in the webpage [101] which helps the writing of this article.  ... 
arXiv:1912.08957v1 fatcat:bdtx2o3qhfhthh2vyohkuwnxxa

Analysis and Optimization of Convolutional Neural Network Architectures [article]

Martin Thoma
2017 arXiv   pre-print
Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap.  ...  Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed.  ...  Batch Normalization In [CUH15] , the authors write that Batch Normalization does not improve ELU networks.  ... 
arXiv:1707.09725v1 fatcat:a5hg2v25anclndhvv7dytvs2kq

Understanding the impact of entropy on policy optimization [article]

Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans
2019 arXiv   pre-print
Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with exploration by encouraging the selection of more stochastic policies.  ...  We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function.  ...  How does batch normalization help optimization?(no, it is not about internal covariate shift). arXiv preprint arXiv:1805.11604, 2018. Figure S8.  ... 
arXiv:1811.11214v5 fatcat:35meggejbrdt3a6vsqpahs4c5q

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [article]

Sergey Ioffe, Christian Szegedy
2015 arXiv   pre-print
We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs.  ...  Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout.  ...  Batch Normalization makes the distribution more stable and reduces the internal covariate shift.  ... 
arXiv:1502.03167v3 fatcat:76bzkeqqnnanxi67zhpnqnef5y

Probabilistic Line Searches for Stochastic Optimization [article]

Maren Mahsereci, Philipp Hennig
2017 arXiv   pre-print
Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space.  ...  The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.  ...  This is an approximation since the true covariance matrix is in general not diagonal.  ... 
arXiv:1703.10034v2 fatcat:il3dv7kwevh5xfxujbkghcqdoy

Optimal decoding of information from a genetic network [article]

Mariela D. Petkova, Gašper Tkačik, William Bialek, Eric F. Wieschaus, Thomas Gregor
2016 arXiv   pre-print
Gene expression levels carry information about signals that have functional significance for the organism.  ...  The resulting maps are distorted, and these distortions predict, with no free parameters, the positions of expression stripes for the pair-rule genes in the mutant embryos.  ...  Thus, even in the posterior half of the embryo, the map is shifted, and the plot of x * vs x (following the ridge of high probability in the map) does not have unit slope.  ... 
arXiv:1612.08084v1 fatcat:judizpiff5gddfix6aadb4itii
« Previous Showing results 1 — 15 out of 6,316 results