8,788 Hits in 5.5 sec

Estimating Training Data Influence by Tracing Gradient Descent [article]

Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale
2020 arXiv   pre-print
We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model.  ...  It applies to any machine learning model trained using stochastic gradient descent or a variant of it, agnostic of architecture, domain and task.  ...  [11] , a technique closely related to TracIn, identifies the influence of training examples on the overall loss by tracing the training process while TracIn identifies the influence on the loss of a  ... 
arXiv:2002.08484v3 fatcat:ctntn67xpzbxxolhmspuoxctai

SGD Implicitly Regularizes Generalization Error [article]

Daniel A. Roberts
2021 arXiv   pre-print
descent acts to regularize generalization error by decorrelating nearby updates.  ...  We then compare the change in the test error for stochastic gradient descent to the change in test error from an equivalent number of gradient descent updates and show explicitly that stochastic gradient  ...  This extended abstract was brought to you by the letter Σ after averaging over many different realizations.  ... 
arXiv:2104.04874v1 fatcat:suctw27psffa5chxf2w2yup4m4

Interpretable Data-Based Explanations for Fairness Debugging [article]

Romila Pradhan, Jiongli Zhu, Boris Glavic, Babak Salimi
2021 arXiv   pre-print
Specifically, we introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias.  ...  In this work, we introduce Gopher, a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that  ...  While our single-step gradient descent approach can be used to estimate subset influence, this may not be a good idea for learning algorithms use more efficient techniques than gradient descent.  ... 
arXiv:2112.09745v1 fatcat:2tfjxr7kg5f33eiovymhbbm3sq

Optimization Variance: Exploring Generalization Properties of DNNs [article]

Xiao Zhang, Dongrui Wu, Haoyi Xiong, Bo Dai
2021 arXiv   pre-print
Inspired by this result, we propose a novel metric, optimization variance (OV), to measure the diversity of model updates caused by the stochastic gradients of random training batches drawn in the same  ...  OV can be estimated using samples from the training set only but correlates well with the (unknown) \emph{test} error, and hence early stopping may be achieved without using a validation set.  ...  Intuitively, the OV represents the inconsistency of gradients' influence on the model.  ... 
arXiv:2106.01714v1 fatcat:v426umr34vfttkn337caiibv44

Enhancing Performance of a Deep Neural Network: A Comparative Analysis of Optimization Algorithms

Noor Fatima
2020 Advances in Distributed Computing and Artificial Intelligence Journal  
This work will provide insightful analysis to a data scientist in choosing the best optimizer while modelling their deep neural network.  ...  Stochastic Gradient Descent (SGD) The first optimizer we chose is Stochastic Gradient Descent (SGD).  ...  Stochastic Gradient Descent (SGD) Stochastic Gradient Descent or SGD is a variant of the most basic optimization Algorithm known as Gradient (or slope of a function) Descent.  ... 
doi:10.14201/adcaij2020927990 fatcat:mo7gwxkcujf5fadwwiwoef3xpq

Dynamic Measurement System Modeling with Machine Learning Algorithms

Changqiao Wu, Guoqing Ding, Xin Chen
2018 Zenodo  
Besides, method with normal equation and second order gradient descent are proposed to accelerate the modeling process, and ways of better gradient estimation are discussed.  ...  For conventional gradient descent, the mini-batch learning and gradient with momentum contribute to faster convergence and enhance model ability.  ...  But we would alleviate the influence by averaging the gradients over several samples, as the noise is usually distributed under Gaussian distribution.  ... 
doi:10.5281/zenodo.2022016 fatcat:gknumaxvsrgbbceygqybi7rwnu

N-SfC: Robust and Fast Shape Estimation from Caustic Images [article]

Marc Kassubeck, Moritz Kappel, Susana Castillo, Marcus Magnor
2021 arXiv   pre-print
descent, which enables better convergence using fewer iterations.  ...  The recent Shape from Caustics (SfC) method casts the problem as the inverse of a light propagation simulation for synthesis of the caustic image, that can be solved by a differentiable renderer.  ...  It is motivated by the success of learned gradient descent methods [1, 8] in solving ill-posed parameter estimation problems.  ... 
arXiv:2112.06705v1 fatcat:vkrok2rkv5h4doczc4olpeai5i

Influence Estimation for Generative Adversarial Networks [article]

Naoyuki Terashita, Hiroki Ohashi, Yuichi Nonaka, Takashi Kanemaru
2021 arXiv   pre-print
To this end, (1) we propose an influence estimation method that uses the Jacobian of the gradient of the generator's loss with respect to the discriminator's parameters (and vice versa) to trace how the  ...  We experimentally verified that our influence estimation method correctly inferred the changes in GAN evaluation metrics.  ...  By taking η [t] G and η [t] D such that they alternatively take 0 at each step, we can have ASGD and the estimator of ASGD-Influence for the alternating gradient descent.  ... 
arXiv:2101.08367v2 fatcat:x6eph4cg55b5xohfsauix4kpni

Improving Learning Performance in Neural Networks

Falah Al-akashi, Faculty of Engineering, University of Kufa, Najaf, Iraq
2021 International Journal of Hybrid Information Technology  
We will show how the algorithm approximates gradient descent of the expected solutions produced by the nodes in the space of pheromone trails.  ...  Several characteristicss of noisy data sources have been used to optimize the observations in a group of neural networks during their learning process.  ...  In Stochastic Gradient Descent (SGD), a random estimation of the unbiased gradient is involved instead of the real gradient.  ... 
doi:10.21742/ijhit.2021.14.1.02 fatcat:s4wkvzc3f5c3jjm5wavlnjfy5m

Automated Spectral Kernel Learning

Jian Li, Yong Liu, Weiping Wang
In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training.  ...  The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and output-independent, which limits  ...  Acknowledgments This work was supported in part by the National Natural Science Foundation of China (No.61703396, No.61673293), the CCF-Tencent Open Fund, the Youth Innovation Promotion Association CAS  ... 
doi:10.1609/aaai.v34i04.5892 fatcat:2lm65ygh6bginjo2swjk2jjtny

Feature-Wise Bias Amplification [article]

Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta
2019 arXiv   pre-print
training data is available.  ...  This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model.  ...  (c): Extent of overestimation of weak-feature coefficients in logistic classifiers trained with stochastic gradient descent, in terms of the amount of training data.  ... 
arXiv:1812.08999v2 fatcat:dbdvoocw7faxjcedpy7j4aggku

Automated Spectral Kernel Learning [article]

Jian Li, Yong Liu, Weiping Wang
2020 arXiv   pre-print
Further, we derive a data-dependent generalization error bound based on Rademacher complexity, which estimates the generalization ability of the learning framework and suggests two regularization terms  ...  learning the spectral measure from the data.  ...  Acknowledgments This work was supported in part by the National Natural  ... 
arXiv:1909.04894v2 fatcat:x7vmhw5korh6th722awqb62bgy

Neural Radiosity [article]

Saeed Hadadan, Shuhong Chen, Matthias Zwicker
2021 arXiv   pre-print
We introduce Neural Radiosity, an algorithm to solve the rendering equation by minimizing the norm of its residual similar as in traditional radiosity techniques.  ...  Since the Monte Carlo gradient estimates are unbiased, this stochastic gradient descent is guaranteed to converge to a local minimum.  ...  Minibatch Stochastic Gradient Descent. We minimize the residual norm using minibatch stochastic gradient descent, as described by the pseudocode in Algorithm 1.  ... 
arXiv:2105.12319v1 fatcat:6nr62aopaff6tdtbmhw3lp36vm

Rapid Modeling of the Sound Speed Field in the South China Sea Based on a Comprehensive Optimal LM-BP Artificial Neural Network

Jin Huang, Yu Luo, Jian Shi, Xin Ma, Qian-Qian Li, Yan-Yi Li
2021 Journal of Marine Science and Engineering  
Through the prediction and verification of the data from 2009 to 2012, the newly proposed optimized BP network model is shown to dramatically reduce the training time and improve precision compared to  ...  The sound speed profile was described by five indicators: date, time, latitude, longitude, and depth.  ...  Acknowledgments: Thanks to the South China Sea Institute of Oceanology, the Chinese Academy of Sciences for providing data support for this study and the members of the project for their contributions  ... 
doi:10.3390/jmse9050488 fatcat:qpyhopz6x5c7xkzoci2d2q752m

Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet) [article]

Ali Habibnia Emory University)
2019 arXiv   pre-print
We estimate optimal values of shrinkage hyperparameters by incorporating a gradient-based optimization technique resulting in robust predictions with improved reproducibility.  ...  To overcome the curse of dimensionality and manage data and model complexity, we examine shrinkage estimation of a back-propagation algorithm of a deep neural net with skip-layer connections.  ...  As opposed to elementary parameters, these hyperparamters cannot be directly trained by the data.  ... 
arXiv:1904.11145v1 fatcat:6xdamh6bj5egxdbo7xemxap654
« Previous Showing results 1 — 15 out of 8,788 results