Filters








8,006 Hits in 8.2 sec

On the Noisy Gradient Descent that Generalizes as SGD [article]

Jingfeng Wu, Wenqing Hu, Haoyi Xiong, Jun Huan, Vladimir Braverman, Zhanxing Zhu
<span title="2020-06-19">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Finally, thanks to the flexibility of choosing noise class, an algorithm is proposed to perform noisy gradient descent that generalizes well, the variant of which even benefits large batch SGD training  ...  Moreover, the sampling noises unify two kinds of gradient regularizing noises that belong to the Gaussian class: the one using (scaled) Fisher as covariance and the one using the gradient covariance of  ...  Finally, an algorithm is proposed to perform noisy gradient descent that generalizes as SGD. The algorithm can be extended for practical usage like large batch training.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1906.07405v3">arXiv:1906.07405v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/odo7cpoht5cdzk5o5auuog7g7q">fatcat:odo7cpoht5cdzk5o5auuog7g7q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200623071704/https://arxiv.org/pdf/1906.07405v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3f/bd/3fbd9489eb476be1d83f6a543c2f1b3172f17505.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1906.07405v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

On the Convergence of A Family of Robust Losses for Stochastic Gradient Descent [article]

Bo Han and Ivor W. Tsang and Ling Chen
<span title="2016-05-05">2016</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The convergence of Stochastic Gradient Descent (SGD) using convex loss functions has been widely studied.  ...  We not only reveal that the convergence rate is O(1/T) for SGD methods using robust losses, but also provide the robustness analysis on two representative robust losses.  ...  Specifically, the generalized algorithm consists of two special cases. For Stochastic Gradient Descent with Smooth Ramp Loss, the algorithm employs "Set I and Update I".  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1605.01623v1">arXiv:1605.01623v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hgssof3imfc6pkumwsm3x6ogia">fatcat:hgssof3imfc6pkumwsm3x6ogia</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200830021452/https://arxiv.org/pdf/1605.01623v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/33/bc/33bcc5086840971c111cad847f678ca01f604f1b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1605.01623v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Stochastic gradient descent with differentially private updates

Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate
<span title="">2013</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qt5pvjrtkjfdzgqcdlcw2blce4" style="color: black;">2013 IEEE Global Conference on Signal and Information Processing</a> </i> &nbsp;
Our results show that standard SGD experiences high variability due to differential privacy, but a moderate increase in the batch size can improve performance significantly.  ...  In this paper, we derive differentially private versions of stochastic gradient descent, and test them empirically.  ...  ACKNOWLEDGEMENT KC and SS would like to thank NIH U54-HL108460, the Hellman Foundation, and NSF IIS 1253942 for support.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/globalsip.2013.6736861">doi:10.1109/globalsip.2013.6736861</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/globalsip/SongCS13.html">dblp:conf/globalsip/SongCS13</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6hy5t2biwzcivdyrvxbhkcm2nm">fatcat:6hy5t2biwzcivdyrvxbhkcm2nm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170921235500/http://www.ece.rutgers.edu/%7Easarwate/pdfs/SongCS13sgd.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6b/50/6b50bd68967bfe032e8371ce45581e373d2f2bf6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/globalsip.2013.6736861"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Stochastic Gradient Variance Reduction by Solving a Filtering Problem [article]

Xingyi Yang
<span title="2021-05-15">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
It is able to correct noisy gradient direction as well as to accelerate the convergence of learning.  ...  Deep neural networks (DNN) are typically optimized using stochastic gradient descent (SGD).  ...  ACKNOWLEDGMENT The authors would like to thank Professor Truong Nguyen for his quality curriculum design and careful explanation for the course content .Thanks for all the classmates in ECE251C.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.12418v2">arXiv:2012.12418v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5scjfj2qgbh2djar6vkcnfm64e">fatcat:5scjfj2qgbh2djar6vkcnfm64e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201228082802/https://arxiv.org/pdf/2012.12418v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/99/e1/99e1618dfe25239521f4b149ee91345b7da6c5bc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.12418v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget [article]

Jaewoo Lee, Daniel Kifer
<span title="2018-08-28">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Intuitively, at the beginning of the optimization, gradients are expected to be large, so that they do not need to be measured as accurately.  ...  Iterative algorithms, like gradient descent, are common tools for solving a variety of problems, such as model fitting.  ...  It is important to note that the noisy gradients ∇f (w t ) + Y t might not be descent directions even when computed on the entire dataset.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.09501v1">arXiv:1808.09501v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/4lijrh4nwnh2hg5ssjveqtr564">fatcat:4lijrh4nwnh2hg5ssjveqtr564</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200907045034/https://arxiv.org/pdf/1808.09501v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/79/30/7930b840420978d9d1be56e4c4e518bfb42c8323.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1808.09501v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Parameter-Free Locally Differentially Private Stochastic Subgradient Descent [article]

Kwang-Sung Jun, Francesco Orabona
<span title="2019-11-21">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, we propose BANCO (Betting Algorithm for Noisy COins), the first ϵ-LDP SGD algorithm that essentially matches the convergence rate of the tuned SGD without any learning rate parameter, reducing  ...  While it has been shown that stochastic optimization is possible with ϵ-LDP via the standard SGD (Song et al., 2013), its convergence rate largely depends on the learning rate, which must be tuned via  ...  We would like to thank Adam Smith for his valuable feedback on differentially-private SGDs.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.09564v1">arXiv:1911.09564v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zqeek7eblver7lrk2zswqmn5he">fatcat:zqeek7eblver7lrk2zswqmn5he</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200908031552/https://arxiv.org/pdf/1911.09564v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/22/25/222551519b0f27fae275cec214db5cc5090be92b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1911.09564v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Robustness of Adaptive Neural Network Optimization Under Training Noise

Subhajit Chaudhury, Toshihiko Yamasaki
<span title="">2021</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/q7qi7j4ckfac7ehf3mjbso4hne" style="color: black;">IEEE Access</a> </i> &nbsp;
We show that popular adaptive optimization methods exhibit poor generalization while learning from noisy training data, compared to vanilla Stochastic Gradient Descent (SGD) and its variants, which manifest  ...  Adaptive gradient methods such as adaptive moment estimation (Adam), RMSProp, and adaptive gradient (AdaGrad) use the temporal history of the gradient updates to improve the speed of convergence and reduce  ...  STOCHASTIC GRADIENT DESCENT (SGD) SGD is the simplest method that computes the gradient of the cost function J w.r.t. to θ for small batches of the dataset.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2021.3062990">doi:10.1109/access.2021.3062990</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wqbsmwirnzbydndr7banpf572q">fatcat:wqbsmwirnzbydndr7banpf572q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210501212116/https://ieeexplore.ieee.org/ielx7/6287639/9312710/09366477.pdf?tp=&amp;arnumber=9366477&amp;isnumber=9312710&amp;ref=" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/86/f9/86f93bfc1bcfe1f041bdac82288b367e9dbb4eb3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2021.3062990"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> ieee.com </button> </a>

Deep Bilevel Learning [chapter]

Simon Jenni, Paolo Favaro
<span title="">2018</span> <i title="Springer International Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/2w3awgokqne6te4nvlofavy5a4" style="color: black;">Lecture Notes in Computer Science</a> </i> &nbsp;
In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities.  ...  We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent.  ...  This work was supported by the Swiss National Science Foundation (SNSF) grant number 200021 169622. Deep Bilevel Learning  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-030-01249-6_38">doi:10.1007/978-3-030-01249-6_38</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hy6tdc6cvzhqpih5qt7x3jqnra">fatcat:hy6tdc6cvzhqpih5qt7x3jqnra</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200208180440/http://openaccess.thecvf.com/content_ECCV_2018/papers/Simon_Jenni_Deep_Bilevel_Learning_ECCV_2018_paper.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f8/36/f83655949477ebddbaa793ebd93775ba8f0bfd75.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/978-3-030-01249-6_38"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> springer.com </button> </a>

Deep Bilevel Learning [article]

Simon Jenni, Paolo Favaro
<span title="2018-09-05">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities.  ...  We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent.  ...  This work was supported by the Swiss National Science Foundation (SNSF) grant number 200021 169622. Deep Bilevel Learning  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1809.01465v1">arXiv:1809.01465v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ahz3refmj5fevoqlgqgz7icuse">fatcat:ahz3refmj5fevoqlgqgz7icuse</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906081613/https://arxiv.org/pdf/1809.01465v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/52/b3/52b3cd29e694140b5a354c3cc62f4fa848a0006a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1809.01465v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Differentially Private Learning Needs Hidden State (Or Much Faster Convergence) [article]

Jiayuan Ye, Reza Shokri
<span title="2022-03-10">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we extend this hidden-state analysis to the noisy mini-batch stochastic gradient descent algorithms on strongly-convex smooth loss functions.  ...  However, by assuming hidden states for DP algorithms (when only the last-iterate is observable), recent works prove a converging privacy bound for noisy gradient descent (on strongly convex smooth loss  ...  2) represents the mini-batches generated (sampled) in each iteration of the noisy minibatch gradient descent algorithm.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.05363v1">arXiv:2203.05363v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/oxbi5a7d6japfjkl5l7cpgsgkm">fatcat:oxbi5a7d6japfjkl5l7cpgsgkm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220312153437/https://arxiv.org/pdf/2203.05363v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/36/0f/360f5a13ff8e35b2a7f0139b5067f01eb4605aca.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.05363v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Stochastic Training is Not Necessary for Generalization [article]

Jonas Geiping, Micah Goldblum, Phillip E. Pope, Michael Moeller, Tom Goldstein
<span title="2022-04-19">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks.  ...  In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures.  ...  A central reason for the success of stochastic gradient descent is its efficiency in the face of large datasetsa noisy estimate of the loss function gradient is generally sufficient to improve the parameters  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2109.14119v2">arXiv:2109.14119v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/izkob2pvcfefhaqospgdzjnr7e">fatcat:izkob2pvcfefhaqospgdzjnr7e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220517144448/https://arxiv.org/pdf/2109.14119v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/60/f9/60f9fa6d4f966c316353ee2753021c4587fa0273.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2109.14119v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Improving SGD convergence by online linear regression of gradients in multiple statistically relevant directions [article]

Jarek Duda
<span title="2019-04-14">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Deep neural networks are usually trained with stochastic gradient descent (SGD), which minimizes objective function using very rough approximations of gradient, only averaging to the real gradient.  ...  Outside the second order modeled subspace we can simultaneously perform gradient descent.  ...  One difficulty is locally choosing subspace where the action is, for example as largest eigenvalues subspace of PCA of recent noisy gradients.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1901.11457v5">arXiv:1901.11457v5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/g6zduyz3tjeunhd77elwcxcw4i">fatcat:g6zduyz3tjeunhd77elwcxcw4i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191019195202/https://arxiv.org/pdf/1901.11457v5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/df/57/df5749d166de9f7d47f98aed6a77b4e0df323f74.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1901.11457v5" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Regularization in network optimization via trimmed stochastic gradient descent with noisy label [article]

Kensuke Nakamura, Bong-Soo Sohn, Kyoung-Jae Won, Byung-Woo Hong
<span title="2022-05-03">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose a first-order optimization method (Label-Noised Trim-SGD) that uses the label noise with the example trimming in order to remove the outliers based on the loss.  ...  However, it can cause undesirable misleading gradients due to the large loss associated with incorrect labels.  ...  The common choice to alleviate these issues is the stochastic gradient descent (SGD) that updates model using a subset of examples (β), called mini-batch, as w t+1 = w t − η t   1 B i∈β t ∇f i (w t )  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.11073v3">arXiv:2012.11073v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/si5b25mbqzea5aytmxlami55gy">fatcat:si5b25mbqzea5aytmxlami55gy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220506073441/https://arxiv.org/pdf/2012.11073v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/92/9a/929aeb77f9277c59b1336b06acb82c1799604872.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.11073v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Task-Driven Data Verification via Gradient Descent [article]

Siavash Golkar, Kyunghyun Cho
<span title="2019-05-14">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We compute these inclusion variables by optimizing the performance of the network on the clean validation set via "gradient descent on gradient descent" based learning.  ...  The inclusion variables as well as the network trained in such a way form the basis of our methods, which we call Corruption Detection via Gradient Descent (CDGD).  ...  SG is supported by the James Arthur Postdoctoral Fellowship.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.05843v1">arXiv:1905.05843v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cgacyhly4re6ta6irmuvosupze">fatcat:cgacyhly4re6ta6irmuvosupze</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191021172936/https://arxiv.org/pdf/1905.05843v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/99/13/99132bad324ffbadac20641168b8415400be8ef9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.05843v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

On the Convergence of SGD with Biased Gradients [article]

Ahmad Ajalloeian, Sebastian U. Stich
<span title="2021-05-09">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We analyze the complexity of biased stochastic gradient methods (SGD), where individual updates are corrupted by deterministic, i.e. biased error terms.  ...  For instance, in the domain of distributed learning, biased gradient compression techniques such as top-k compression have been proposed as a tool to alleviate the communication bottleneck and in derivative-free  ...  Biased SGD Framework In this section we study the convergence of stochastic gradient descent (SGD) with biased gradient oracles as introduced in (2) , see also Algorithm 1.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.00051v2">arXiv:2008.00051v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/gkacdaec5ngwdm2zxighvskwiy">fatcat:gkacdaec5ngwdm2zxighvskwiy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200915190039/https://arxiv.org/pdf/2008.00051v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/89/63/89638c2cefdc059a444884c88c1b9e5e0ea0c236.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2008.00051v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 8,006 results