A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
[article]
2022
Recent research has shown the existence of significant redundancy in large Transformer models. One can prune the redundant parameters without significantly sacrificing the generalization performance. However, we question whether the redundant parameters could have contributed more if they were properly trained. To answer this question, we propose a novel training strategy that encourages all parameters to be trained sufficiently. Specifically, we adaptively adjust the learning rate for each
doi:10.48550/arxiv.2202.02664
fatcat:i7k27so6ijhhjhlqw5kxenhwkm