A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
How Does Batch Normalization Help Optimization?
[article]
2019
arXiv
pre-print
The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called "internal covariate shift". ...
Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). ...
Acknowledgements We thank Ali Rahimi and Ben Recht for helpful comments on a preliminary version of this paper. ...
arXiv:1805.11604v5
fatcat:zj6ybdoo3rdbldv2idlshwwzj4
Optimal Classification of COVID-19: A Transfer Learning Approach
2020
International Journal of Computer Applications
Analyzing and then diagnosing is currently a major challenge. This paper focuses on the classification which can help in analysis of COVID-19 with normal chest X-ray using deep learning technique. ...
This is a viral pneumonia and thus no antiviral drug will work to reduce these cases. During the recovery, only immune system has played a major role. ...
Covariate shift in which a novel method named batch normalization was proposed. ...
doi:10.5120/ijca2020920165
fatcat:imxgwadepfchljearconybt6mm
On Optimizing Deep Convolutional Neural Networks by Evolutionary Computing
[article]
2018
arXiv
pre-print
Further, it proposes some insights for optimizing deep neural networks using evolutionary computing techniques. ...
Mini-batch normalization, identification of effective respective fields, momentum updates, introduction of residual blocks, learning rate adoption, etc. have been proposed to speed up the rate of convergent ...
This is known as internal covariant shift. ...
arXiv:1808.01766v1
fatcat:eb6vugsg6nbx5crdme5odyt7iq
Data optimization for large batch distributed training of deep neural networks
[article]
2020
arXiv
pre-print
We observe that the loss landscape minimization is shaped by both the model and training data and propose a data optimization approach that utilizes machine learning to implicitly smooth out the loss landscape ...
Distributed training in deep learning (DL) is common practice as data and models grow. ...
Briefly, batch normalization (BN) addresses the problem of internal covariate shift in the neural network by reducing the dependency of the distribution of the input activations of each layer on all the ...
arXiv:2012.09272v2
fatcat:hrgrznaeefefpkesxlgvuiycou
ConFusion: Sensor Fusion for Complex Robotic Systems using Nonlinear Optimization
2019
IEEE Robotics and Automation Letters
ConFusion is a modular framework for fusing measurements from many heterogeneous sensors within a moving horizon estimator. ...
We demonstrate its performance in comparison to an iterated extended Kalman filter in visual-inertial tracking, and show its versatility through whole-body sensor fusion on a mobile manipulator. ...
It also does not provide the full flexibility in sensing system design offered by MHEs because it runs an EKF at the front of the optimized batch of states to perform marginalization. ...
doi:10.1109/lra.2019.2894168
fatcat:2uwaihj6jndrhcww4peev645xy
Deep Reinforcement Learning for Stock Portfolio Optimization
2020
International Journal of Modeling and Optimization
Stock portfolio optimization is the process of constant re-distribution of money to a pool of various stocks. ...
Observations and hypothesis were discussed about the results, as well as possible future research directions.1 ...
(Note that A is the advantage value, which is defined as International Journal of Modeling and Optimization, Vol. 10, No. 5, October 2020 ( , ) = ( , ) − ( ) , which shows how good an action is compared ...
doi:10.7763/ijmo.2020.v10.761
fatcat:hxwp6agp7vaz7de5vtijjeabdu
On Ensembles, I-Optimality, and Active Learning
2021
Journal of Statistical Theory and Practice
We concentrate on the large batch case, because this is most aligned with most machine learning applications, and because it is more theoretically rich. ...
We illustrate by fitting a deep neural network to about 20 percent of the CIFAR-10 image dataset. The statistical efficiency we achieve is $$3\times$$ 3 × random selection. ...
If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly ...
doi:10.1007/s42519-021-00200-4
fatcat:hpxzvnysnjctxbikmicfdzgfuy
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
[article]
2020
arXiv
pre-print
We name it the Fast Equilibrium Conjecture and suggest it holds the key to why Batch Normalization is effective. ...
., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., ...
While the original motivation is to reduce Internal Covariate Shift (ICS), (Santurkar et al., 2018) challenged this view and argued that the effectiveness of BN comes from a smoothening effect on the ...
arXiv:2010.02916v1
fatcat:b3idmkfg4zdpvnwx6qforoyxcm
Model Selection in Batch Policy Optimization
[article]
2021
arXiv
pre-print
In contrast, the third source is unique to batch policy optimization and is due to dataset shift inherent to the setting. ...
We first show that no batch policy optimization algorithm can achieve a guarantee addressing all three simultaneously, revealing a stark contrast between difficulties in batch policy optimization and the ...
It is clear that a need exists in batch policy optimization for an analogue to methods like
cross-validation in supervised learning. ...
arXiv:2112.12320v1
fatcat:thrtvpivf5gy5h3i7cff5kzmvu
Optimization for deep learning: theory and algorithms
[article]
2019
arXiv
pre-print
This article provides an overview of optimization algorithms and theory for training neural networks. ...
First, we discuss the issue of gradient explosion/vanishing and the more general issue of undesirable spectrum, and then discuss practical solutions including careful initialization and normalization methods ...
We also thank Ju Sun for the list of related works in the webpage [101] which helps the writing of this article. ...
arXiv:1912.08957v1
fatcat:bdtx2o3qhfhthh2vyohkuwnxxa
Analysis and Optimization of Convolutional Neural Network Architectures
[article]
2017
arXiv
pre-print
Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. ...
Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. ...
Batch Normalization In [CUH15] , the authors write that Batch Normalization does not improve ELU networks. ...
arXiv:1707.09725v1
fatcat:a5hg2v25anclndhvv7dytvs2kq
Understanding the impact of entropy on policy optimization
[article]
2019
arXiv
pre-print
Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with exploration by encouraging the selection of more stochastic policies. ...
We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. ...
How
does batch normalization help optimization?(no, it is
not about internal covariate shift). arXiv preprint
arXiv:1805.11604, 2018.
Figure S8. ...
arXiv:1811.11214v5
fatcat:35meggejbrdt3a6vsqpahs4c5q
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
[article]
2015
arXiv
pre-print
We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. ...
Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. ...
Batch Normalization makes the distribution more stable and reduces the internal covariate shift. ...
arXiv:1502.03167v3
fatcat:76bzkeqqnnanxi67zhpnqnef5y
Probabilistic Line Searches for Stochastic Optimization
[article]
2017
arXiv
pre-print
Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. ...
The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent. ...
This is an approximation since the true covariance matrix is in general not diagonal. ...
arXiv:1703.10034v2
fatcat:il3dv7kwevh5xfxujbkghcqdoy
Optimal decoding of information from a genetic network
[article]
2016
arXiv
pre-print
Gene expression levels carry information about signals that have functional significance for the organism. ...
The resulting maps are distorted, and these distortions predict, with no free parameters, the positions of expression stripes for the pair-rule genes in the mutant embryos. ...
Thus, even in the posterior half of the embryo, the map is shifted, and the plot of x * vs x (following the ridge of high probability in the map) does not have unit slope. ...
arXiv:1612.08084v1
fatcat:judizpiff5gddfix6aadb4itii
« Previous
Showing results 1 — 15 out of 6,316 results