22 Hits in 10.6 sec

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation [article]

Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
2021 arXiv   pre-print
Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines.  ...  Our results establish a new protocol for future research toward fast, accurate machine translation. Our code is available at  ...  This research was in part funded by the Funai Overseas Scholarship to Jungo Kasai.  ... 
arXiv:2006.10369v4 fatcat:ml3ghh7zf5aaxkjyk5mcy7jaeq

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference [article]

Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat
2021 arXiv   pre-print
Sparse Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation.  ...  In this work, we investigate routing strategies at different granularity (token, sentence, task) in MoE models to bypass distillation.  ...  Deep encoder, shallow decoder: Reevaluating the speed-quality tradeoff in machine translation. arXiv preprint arXiv:2006.10369. Yoon Kim and Alexander M. Rush. 2016.  ... 
arXiv:2110.03742v1 fatcat:stp4wtshfjanncfo4axzwsm3ki

Bag of Tricks for Optimizing Transformer Efficiency

Ye Lin, Yanyang Li, Tong Xiao, Jingbo Zhu
2021 Findings of the Association for Computational Linguistics: EMNLP 2021   unpublished
Deep encoder, shallow self-attention: Specialized heads do the heavy lift- decoder: Reevaluating the speed-quality tradeoff in ing, the rest can be pruned.  ...  In Proceedings of the machine translation.  ... 
doi:10.18653/v1/2021.findings-emnlp.357 fatcat:abneqepygnbupondrn4wk3qaam

Gender bias amplification during Speed-Quality optimization in Neural Machine Translation

Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)   unpublished
We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models  ...  Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?  ...  Deep encoder, shallow decoder: Reevaluating the speed-quality tradeoff in machine translation. arXiv preprint arXiv:2006.10369.  ... 
doi:10.18653/v1/2021.acl-short.15 fatcat:hsb2vb2rvfbwrm64gcs26755jy

Learning Rich Representations For Structured Visual Prediction Tasks [article]

Mohammadreza Mostajabi
2019 arXiv   pre-print
Applied to semantic segmentation and other structured prediction tasks, our approach exploits statistical structure in the image and in the label space without setting up explicit structured prediction  ...  Our innovation takes the form of a regularizer derived by learning an autoencoder over the set of annotations.  ...  Sparsity significantly speeds the segmentation training and diversity leads to high quality segmentation output.  ... 
arXiv:1908.11820v1 fatcat:n2utrggy5faszodf4noe5ayram

Sequence-to-Lattice Models for Fast Translation

Yuntian Deng, Alexander Rush
2021 Findings of the Association for Computational Linguistics: EMNLP 2021   unpublished
shallow decoder: Reevaluating the speed-quality tradeoff in machine translation. arXiv preprint Yuntian Deng and Alexander Rush. 2020.  ...  In order to the practical costs of deep decoder models.  ... 
doi:10.18653/v1/2021.findings-emnlp.318 fatcat:kvquowpu3jan7azy4c64gq5yha

Use of Frontal Lobe Hemodynamics as Reinforcement Signals to an Adaptive Controller

Marcello M. DiStasio, Joseph T. Francis, Thomas Boraud
2013 PLoS ONE  
The agent's adaptability allows its performance to exceed the limits of the NIRS classifier decoding accuracy.  ...  Decision-making ability in the frontal lobe (among other brain structures) relies on the assignment of value to states of the animal and its environment.  ...  Acknowledgments The authors would like to thank Harry Graber and Randall Barbour of the Optical Tomography Group at SUNY Downstate Medical Center for very helpful discussions on NIRS data acquisition and  ... 
doi:10.1371/journal.pone.0069541 pmid:23894500 pmcid:PMC3718814 fatcat:5ohhoisfxnebnibseu5wffrf7e

Data and Parameter Scaling Laws for Neural Machine Translation

Mitchell A Gordon, Kevin Duh, Jared Kaplan
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
Deep encoder, shallow rization of Text and Speech (CLSSTS2020), pages decoder: Reevaluating the speed-quality tradeoff in 7–13, Marseille, France.  ...  encoder and a shallower decoder can be more efficient, which We also increase the checkpoint frequency for earlier stopping.  ... 
doi:10.18653/v1/2021.emnlp-main.478 fatcat:vfaiugpaareltkkshjxkw6okz4

Reservoir Transformers

Sheng Shen, Alexei Baevski, Ari Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)   unpublished
time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.  ...  Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute  ...  SS and KK were supported by grants from Samsung, Facebook, and the Berkeley Deep Drive Consortium.  ... 
doi:10.18653/v1/2021.acl-long.331 fatcat:vkbgm74kz5bihj7oqcyk4m62lu

Software challenges in extreme scale systems

Vivek Sarkar, William Harrod, Allan E Snavely
2009 Journal of Physics, Conference Series  
Prior parallel machines included the IBM 3838 Array Processor which for a time was the fastest single precision floating point processor marketed by IBM, and the Space Shuttle Input/Output Processor which  ...  the fastest possible implementations of circuits such as adders with limited fan-in blocks (known as the Kogge-Stone adder).  ...  The key distinction between shallow and deep component interoperability frameworks is that shallow framework components manage their own parallelism and data structures and exchange data using external  ... 
doi:10.1088/1742-6596/180/1/012045 fatcat:iukutry2dvbitfdh6ng7kgz564

Lunar Impact: A History of Project Ranger

Loyd S. Swenson, R. Cargill Hall
1978 Technology and Culture  
The central computer and sequencer would receive its information from the command subsystem using decoding components in the spacecraft and encoding components on the ground.  ...  The telemetry-to-teletype encoders at the deep space stations in During the afternoon Burke, stunned, was on the phone with Space Flight Test Director Rygh at JPL (Figure 61).  ... 
doi:10.2307/3103798 fatcat:4kdonkmzxfd23bbcez226fjaoy

Relevance of time‐varying properties of the first formant frequency in vowel representation

Maria‐Gabriella Di Benedetto
1985 Journal of the Acoustical Society of America  
decoded signal• It is clarified that the ADPCM-AB system has the best speech quality among the conventional backward-type coding systems.  ...  Since the source level results compare favorably, at least in deep water, an average source spectrum level density as a function of frequency and surface wind speed is proposed for use in noise models.  ...  The translational temperature during the reiaxation process is monitored by measuring the sound speed in the tube.  ... 
doi:10.1121/1.2023018 fatcat:3lbckbce2rglzj7zx3fp4nhiui

Praise for Object-Oriented Reengineering Patterns [chapter]

2003 Object-Oriented Reengineering Patterns  
as well (pp. 723-725). • Class interfaces descriptions are generated; shallow but verify on line. • Documentation for database schema is generated; shallow but verify on line. • Finite state-machines  ...  up to speed.  ...  In this chapter we have listed only those patterns that are specifically referred to at some point in this book. We have grouped them into the following three categories: • Testing patterns.  ... 
doi:10.1016/b978-155860639-5/50000-6 fatcat:yfchb4mlyvfj3l5w422zp3y47q

Xxxii Scar Open Science Conference 'Antarctic Science And Policy Advice In A Changing World - Conference Abstracts [article]

Eoghan Griffin, Renuka Badhe
2016 Zenodo  
Abstracts from the 2012 Open Science Conference of the Scientific Committee on Antarctic Research (SCAR), Antarctic Science and Policy Advice in a Changing World, held in Portland, Oregon, USA  ...  In particular, improved modeling of geodynamic evolution of the Antarctic lithosphere that was a key  ...  These new magnetic data together with survey data that were not previously in the public domain can significantly upgrade the ADMAP compilation for crustal studies of the Antarctic.  ... 
doi:10.5281/zenodo.53122 fatcat:o56w4vr3n5bsvkyj6vnetfem3e

Logical partitioning of parallel system simulations

Hari Angepat, Austin, The University Of Texas At, Austin, The University Of Texas At, Derek Chiou, Mattan Erez
The work embodied in this dissertation explores how to leverage novel ideas in simulator partitioning to improve simulator speed and flexibility for simulating these new types of systems.  ...  By leveraging partitioning in a structured manner, it is possible to design simulators that better address the open challenges of parallel and heterogeneous systems design.  ...  To balance encoding compute complexity and decoding hardware complexity, we encode a trace as a series of 64b words, reserving three high-order bits to select between different encoding formats.  ... 
doi:10.26153/tsw/3268 fatcat:wkotdvpeyrahpatsfwcv4aogti
« Previous Showing results 1 — 15 out of 22 results