Filters








14 Hits in 5.8 sec

ReZero is All You Need: Fast Convergence at Large Depth [article]

Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao, Garrison W. Cottrell, Julian McAuley
2020 arXiv   pre-print
Although much simpler than its predecessors, this gate enables training thousands of fully connected layers with fast convergence and better test performance for ResNets trained on CIFAR-10.  ...  When applied to 12 layer Transformers, it converges 56% faster on enwiki8.  ...  large depths.  ... 
arXiv:2003.04887v2 fatcat:gfiztb7fdvbmpgelw4362x2xie

Do Transformer Modifications Transfer Across Implementations and Applications? [article]

Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li (+4 others)
2021 arXiv   pre-print
Rezero is all you need: Fast convergence at large depth. arXiv preprint arXiv:2003.04887. Alexei Baevski and Michael Auli. 2019. Adaptive input representations for neural language mod- eling.  ...  The depth of the kernel is determined depending on whether it is depthwise-convolution or vanilla convolution in which case its depth is d model .  ... 
arXiv:2102.11972v2 fatcat:w6y6mkrw7vavnkkrcmzgz7roee

Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs [article]

Jiaao Chen, Diyi Yang
2021 arXiv   pre-print
We have publicly released our code at https://github.com/GT-SALT/Structure-Aware-BART.  ...  However, these generated summaries often suffer from insufficient, redundant, or incorrect content, largely due to the unstructured and complex characteristics of human-human interactions.  ...  This work is supported in part by grants from Google, Amazon and Salesforce.  ... 
arXiv:2104.08400v1 fatcat:htl57whsh5eaxmc2jvlblq6qyy

Reservoir Transformers [article]

Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela
2021 arXiv   pre-print
well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence  ...  Rezero is all you need: Fast convergence at large depth.  ...  We observe that the reservoir transformer outperforms normal RoBERTa at all depths in both tasks. At lower depth, the improvements are substantial.  ... 
arXiv:2012.15045v2 fatcat:bhsrep5puvcrxpbfthpgpp3usy

Value Iteration Networks with Double Estimator for Planetary Rover Path Planning

Xiang Jin, Wei Lan, Tianlin Wang, Pengyao Yu
2021 Sensors  
Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments.  ...  We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.  ...  ReZero is all you need: Fast convergence at large depth. In Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence, Online, 27–30 July 2021; pp. 1–10. 24.  ... 
doi:10.3390/s21248418 pmid:34960508 pmcid:PMC8709000 fatcat:iyphuncbpzbcdpgordioqdtvui

A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization [article]

Enrico Civitelli, Alessio Sortino, Matteo Lapucci, Francesco Bagattini, Giulio Galvan
2021 arXiv   pre-print
Batch Normalization is an essential component of all state-of-the-art neural networks architectures.  ...  In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks.  ...  Rezero is all you need: Fast convergence at large depth. CoRR, abs/2003.04887, 2020. URL https://arxiv.org/ abs/2003.04887. Jie Shao, Kai Hu, Changhu Wang, Xiangyang Xue, and Bhiksha Raj.  ... 
arXiv:2112.12299v1 fatcat:q7ms67kuejduffprk3diuyvswm

A Survey of Transformers [article]

Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu
2021 arXiv   pre-print
X-formers) have been proposed, however, a systematic and comprehensive literature review on these Transformer variants is still missing.  ...  Therefore, it is natural to attract lots of interest from academic and industry researchers. Up to the present, a great variety of Transformer variants (a.k.a.  ...  Replacing LN in Transformer with ReZero mechanism is verified to induce better dynamic isometry for input signals and leads to faster convergence.  ... 
arXiv:2106.04554v2 fatcat:pjctgoqeffhq7ntyw52jqwfzsy

A True Renaissance Person

Greta R Patzke, H Schattka, R Dobelli
2011
We presented them all and discussed them." They decided to develop a ballbot that could go fast in all directions, turn around and remain stable.  ...  At ZURICH.MINDS, Péter uses a Playstation joystick to direct where Rezero is going.  ... 
doi:10.5167/uzh-46577 fatcat:mfqqmej42vgmnoo62sulh3u2tm

Manual of Nerve Conduction Studies

G. HALL
2001 Journal of Neurology, Neurosurgery and Psychiatry  
Consequently, a near zero mean drift is likely to occur even though the magnitude of the zero drift in individual cases is large.  ...  The recommendation to change the catheter if a long monitoring period is expected to allow for rezeroing is not held up by the data shown in fig 3, which would suggest that there is more likely to be  ...  A great deal more is known than at the time of writing of the second edition (1992) , and this new edition reflects the update at all levels.  ... 
doi:10.1136/jnnp.70.1.138d fatcat:xlwwjpnhfja6nmciiaz2mn534m

Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs

Jiaao Chen, Diyi Yang
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
We have publicly released our code at https://github.com/ GT-SALT/Structure-Aware-BART.  ...  However, these generated summaries often suffer from insufficient, redundant, or incorrect content, largely due to the unstructured and complex characteristics of human-human interactions.  ...  This work is supported in part by grants from Google, Amazon and Salesforce.  ... 
doi:10.18653/v1/2021.naacl-main.109 fatcat:4jr6kxkyxzd4hlgmkxh65jcera

Hashimoto's encephalopathy responding to plasmapheresis

P M BOERS
2001 Journal of Neurology, Neurosurgery and Psychiatry  
Consequently, a near zero mean drift is likely to occur even though the magnitude of the zero drift in individual cases is large.  ...  The recommendation to change the catheter if a long monitoring period is expected to allow for rezeroing is not held up by the data shown in fig 3, which would suggest that there is more likely to be  ...  A great deal more is known than at the time of writing of the second edition (1992) , and this new edition reflects the update at all levels.  ... 
doi:10.1136/jnnp.70.1.132 pmid:11118266 pmcid:PMC1763442 fatcat:tszcdiazl5atdelep3v7sjzumy

Give the Truth: Incorporate Semantic Slot into Abstractive Dialogue Summarization

Lulu Zhao, Weihao Zeng, Weiran Xu, Jun Guo
2021 Findings of the Association for Computational Linguistics: EMNLP 2021   unpublished
Rezero is all you need: Fast convergence at large depth.  ...  Association is all you need. In Proceedings of the 31st Interna- for Computing Machinery.  ... 
doi:10.18653/v1/2021.findings-emnlp.209 fatcat:wd5zsg7m3zdjrocfzdywibresm

VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation

Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi, Songfang Huang, Fei Huang, Luo Si
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)   unpublished
However, much of this work only relies on the shared vocabulary and bilingual contexts to encourage the correlation across languages, which is loose and implicit for aligning the contextual representations  ...  For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-theart Transformer variants on WMT14 Englishto-German and English-to-French translation datasets,  ...  Rezero is all you need: Fast convergence at large depth. arXiv preprint arXiv:2003.04887. Xinlei Chen and Kaiming He. 2020. Exploring simple siamese representation learning.  ... 
doi:10.18653/v1/2021.acl-long.308 fatcat:w7twl3ujmjhunac7zrnfh5bjv4

An Investigation Into the Significance of Dissipation in Statistical Mechanics [article]

Charlotte Frances Petersen, University, The Australian National, University, The Australian National
2016
The statistics of the average are difficult to work with because its value is extremely dependent on rare events. It is often observed to converge with high accuracy to a value less than expected.  ...  While the derivation is straightforward, calculation of this quantity is anything but.  ...  This happens when the trajectory duration the NPI is estimated at becomes large enough.  ... 
doi:10.25911/5d76378a139a1 fatcat:mvzmx6capvdz3phswtuvb3cyzu