Filters








1,077 Hits in 4.9 sec

On Quadratic Penalties in Elastic Weight Consolidation [article]

Ferenc Huszár
2017 arXiv   pre-print
Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks.  ...  I show that the quadratic penalties in EWC are inconsistent with this derivation and might lead to double-counting data from earlier tasks.  ...  This can be enforced either with two separate penalties or as one by noting that the sum of two quadratic penalties is itself a quadratic penalty".  ... 
arXiv:1712.03847v1 fatcat:mjganhu7kzgppodbizrettrfyq

Note on the quadratic penalties in elastic weight consolidation

Ferenc Huszár
2018 Proceedings of the National Academy of Sciences of the United States of America  
Proc Natl Acad Sci USA 114:3521-3526. 2 Huszár F (2017) On quadratic penalties in elastic weight consolidation. arXiv:1712.03847. 3 Opper M (1998) A Bayesian approach to on-line learning.  ...  Elastic weight consolidation (EWC; ref. 1), published in PNAS, is a novel algorithm designed to safeguard against this. Despite its satisfying simplicity, EWC is remarkably effective.  ... 
doi:10.1073/pnas.1717042115 pmid:29463735 fatcat:drpr7l6difgafasmkroivdxz4i

Reply to Huszár: The elastic weight consolidation penalty is empirically valid

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath (+2 others)
2018 Proceedings of the National Academy of Sciences of the United States of America  
In our recent work on elastic weight consolidation (EWC) (1) we show that forgetting in neural networks can be alleviated by using a quadratic penalty whose derivation was inspired by Bayesian evidence  ...  In his letter (2), Dr. Huszár provides an alternative form for this penalty by following the standard work on expectation propagation using the Laplace approximation (3).  ...  Proc Natl Acad Sci USA 114:3521-3526. 2 Huszár F (2018) Note on the quadratic penalties in elastic weight consolidation.  ... 
doi:10.1073/pnas.1800157115 pmid:29463734 fatcat:mo2m2rtrznfkrhraavcfmmvh2u

Overcoming catastrophic forgetting in neural networks [article]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell
2017 arXiv   pre-print
Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks.  ...  The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence.  ...  This can be enforced either with two separate penalties, or as one by noting that the sum of two quadratic penalties is itself a quadratic penalty.  ... 
arXiv:1612.00796v2 fatcat:urg6iqrtqnemzf2mfhvgqwtizq

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath (+2 others)
2017 Proceedings of the National Academy of Sciences of the United States of America  
We develop an algorithm analogous to synaptic consolidation for artificial neural networks, which we refer to as elastic weight consolidation (EWC).  ...  This algorithm slows down learning on certain weights based on how important they are to previously seen tasks.  ...  We develop an algorithm analogous to synaptic consolidation for artificial neural networks, which we refer to as elastic weight consolidation (EWC).  ... 
doi:10.1073/pnas.1611835114 pmid:28292907 pmcid:PMC5380101 fatcat:ycc27dlo3bbvtei6rdylnjbo6a

Synaptic metaplasticity in binarized neural networks

Axel Laborieux, Maxence Ernoult, Tifenn Hirtzlin, Damien Querlioz
2021 Nature Communications  
Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history.  ...  In this work, we interpret the hidden weights used by binarized neural networks, a low-precision version of deep neural networks, as metaplastic variables, and modify their training technique to alleviate  ...  Grollier for discussion and invaluable feedback on the manuscript.  ... 
doi:10.1038/s41467-021-22768-y pmid:33953183 fatcat:jcgypa7n2nekpjdd43lhcesft4

Synaptic Metaplasticity in Binarized Neural Networks [article]

Axel Laborieux, Maxence Ernoult, Tifenn Hirtzlin, Damien Querlioz
2021 arXiv   pre-print
Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history.  ...  In this work, we interpret the hidden weights used by binarized neural networks, a low-precision version of deep neural networks, as metaplastic variables, and modify their training technique to alleviate  ...  Grollier for discussion and invaluable feedback on the manuscript.  ... 
arXiv:2003.03533v2 fatcat:vb2u55vtqjefhoawzckmywg7ba

Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting [article]

Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, Xiangzhan Yu
2020 arXiv   pre-print
Deep pretrained language models have achieved great success in the way of pretraining first and then fine-tuning.  ...  Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. Our method also enables BERT-base to achieve better performance than directly fine-tuning of BERT-large.  ...  Otherwise, both the quadratic penalty and annealing coefficient would be adapted by the gradient update rules, resulting in different magnitudes of the quadratic penalty among the model's weights.  ... 
arXiv:2004.12651v1 fatcat:zy3poh6lmbb6fksdlnncyds4fa

Continual Learning Through Synaptic Intelligence [article]

Friedemann Zenke, Ben Poole, Surya Ganguli
2017 arXiv   pre-print
Each synapse accumulates task relevant information over time, and exploits this information to rapidly store new memories without forgetting old ones.  ...  We evaluate our approach on continual learning of classification tasks, and show that it dramatically reduces forgetting while maintaining computational efficiency.  ...  Recently, Kirkpatrick et al. (2017) proposed elastic weight consolidation (EWC), a quadratic penalty on the difference between the parameters for the new and the old task.  ... 
arXiv:1703.04200v3 fatcat:6icrj2ze5fhozn6cnn7hb2z5pu

Joint image reconstruction and nonrigid motion estimation with a simple penalty that encourages local invertibility

Se Young Chun, Jeffrey A. Fessler, Ehsan Samei, Jiang Hsieh
2009 Medical Imaging 2009: Physics of Medical Imaging  
The usual choice for deformation regularization has been penalty functions based on the assumption that tissues are elastic.  ...  However, there are fewer studies that focus on deformation regularization in motioncompensated image reconstruction.  ...  In most of these methods the common choice for motion regularization is a simple quadratic roughness penalty or an elastic deformation penalty.  ... 
doi:10.1117/12.811067 fatcat:qusvejpdendrlhbtfoyabfgnt4

Continual Learning for Multi-Dialect Acoustic Models

Brady Houston, Katrin Kirchhoff
2020 Interspeech 2020  
While such training can improve the performance of an acoustic model on a single dialect, it can also produce a model capable of good performance on multiple dialects.  ...  In contrast, sequential transfer learning (fine-tuning) does not require retraining using all data, but may result in catastrophic forgetting of previously-seen dialects.  ...  The two CL penalties investigated in this work are elastic weight consolidation (EWC) [8] and learning without forgetting (LWF) [7] .  ... 
doi:10.21437/interspeech.2020-1797 dblp:conf/interspeech/HoustonK20 fatcat:mqtvbqv2abgarna33nc2hbzgwm

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting [article]

Hippolyt Ritter, Aleksandar Botev, David Barber
2018 arXiv   pre-print
The method is grounded in a Bayesian online learning framework, where we recursively approximate the posterior after every task with a Gaussian, leading to a quadratic penalty on changes to the weights  ...  We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks.  ...  Table 1 : Per dataset test accuracy at the end of training on the suite of vision datasets. SI is Synaptic Intelligence [41] and EWC Elastic Weight Consolidation [16] .  ... 
arXiv:1805.07810v1 fatcat:azw6j7ky5ndzhfthecx56kqr6m

Performance Persistence

Stephen J. Brown, William N. Goetzmann
1995 Journal of Finance  
A previously optimal solution, or a slight variation of one, may still be nearly optimal in a new scenario and managerially preferable to a dramatically different solution that is mathematically optimal  ...  When elastic penalties are used with a sequence of cumulant variables or constraints, they imply penalties on a weighted average of their noncumulant counterparts.  ...  years, CFpf, = the fixed cost paid in year f when line p is started in year f, / = relative weight for standard costs versus linear persistent elastic penalties Variables: ui-= for c < /, inventory of  ... 
doi:10.2307/2329424 fatcat:x6xf36cbmvfytcrskbi7qbmx2e

A Novel Solution of an Elastic Net Regularization for Dementia Knowledge Discovery using Deep Learning [article]

Kshitiz Shrestha, Omar Hisham Alsadoon, Abeer Alsadoon, Tarik A. Rashid, Rasha S. Ali, P.W.C. Prasad, Oday D. Jerew
2021 arXiv   pre-print
In addition to that, the proposed method has improved the classification accuracy by 5% on average and reduced the processing time by 30 ~ 40 seconds on average.  ...  This paper aims to increase the accuracy and reduce the processing time of classification through Deep Learning Architecture by using Elastic Net Regularization in Feature Selection.  ...  On the other hand, the use of Elastic Net Regularization adds a quadratic part to the penalty which when used it alone is called the Ridge regression as it was discussed in the previous section.  ... 
arXiv:2109.00896v1 fatcat:m6o4bt5pjjhsbfayyjdyo522nm

Optimal Fiscal and Monetary Policy, Debt Crisis and Management

Cristiano Cantore, Paul Levine, Giovanni Melina, Joseph Pearlman
2017 IMF Working Papers  
ψ is a scaling parameter that determines the relative weight of leisure in utility and ϑ is a preference parameter that determines the Frisch elasticity of labour supply.  ...  σx−1 σx + (1 − ν x ) 1 σx G σx−1 σx t σx σ x−1 , (23) where ν x is the weight of private goods in the aggregate and σ x is the elasticity of substitution between private and public consumption.  ...  Thus one must account for the forward-looking terms in − 1 βλ T f 2ỹ0 when optimising policy.  ... 
doi:10.5089/9781475590180.001 fatcat:ptbsjnytergshdqj4lenxehpsy
« Previous Showing results 1 — 15 out of 1,077 results