Filters








10 Hits in 3.4 sec

Scalable and Efficient MoE Training for Multitask Multilingual Models [article]

Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla
2021 arXiv   pre-print
By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy  ...  The system support of efficient MoE training has been implemented and open-sourced with the DeepSpeed library.  ...  We utilize the proposed DeepSpeed MoE system and effective training methods and recipes to train a family of highly efficient large scale language models called Z-code M3 (Multilingual Multitask MoE).  ... 
arXiv:2109.10465v1 fatcat:k45qxinuqzcg3gdyjh6rskvsqm

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference [article]

Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat
2021 arXiv   pre-print
Sparse Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation.  ...  However, MoE models are prohibitively large and practitioners often resort to methods such as distillation for serving.  ...  efficiency for large models.  ... 
arXiv:2110.03742v1 fatcat:stp4wtshfjanncfo4axzwsm3ki

Tricks for Training Sparse Translation Models [article]

Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan
2021 arXiv   pre-print
and dense pre-training.  ...  Overall, these methods improve performance on two multilingual translation benchmarks compared to standard BASELayers and Dense scaling baselines, and in combination, more than 2x model convergence speed  ...  Sparse Scaling Sparsely-gated MoE models were introduced to increase model capacity in a flexible and scalable manner via model parallelism.  ... 
arXiv:2110.08246v1 fatcat:cbtnsy5g7fcjve3jnb2jjvb5au

No Language Left Behind: Scaling Human-Centered Machine Translation [article]

NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun (+27 others)
2022 arXiv   pre-print
More specifically, we developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource  ...  We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.  ...  We thank Kyle Johnson for his help with UXR studies and model evaluation. We thank Antoine Bordes, Marina Zannoli, and Chris Moghbel for supporting this project.  ... 
arXiv:2207.04672v2 fatcat:gsbt3imt4bgodpmubpaq53onnm

Multi-Spectral Widefield Microscopy of the Beating Heart Through Post-Acquisition Synchronization and Unmixing

Christian Jaques, Linda Bapst-Wicht, Daniel F. Schorderet, Michael Liebling
2019 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)  
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech  ...  The goal of this thesis is to improve current state-of-the-art acoustic modeling techniques in general for ASR, with a particular focus on multilingual ASR and cross-lingual adaptation.  ...  [2016] used phoneme labels for training a multi-accent CTC-based ASR system in a multitask setting.  ... 
doi:10.1109/isbi.2019.8759472 dblp:conf/isbi/JaquesBSL19 fatcat:flypznnglbfrzm3ayf6tsfof34

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale [article]

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He
2022 arXiv   pre-print
DeepSpeed-MoE offers an unprecedented scale and efficiency to serve massive MoE models with up to 4.5x faster and 9x cheaper inference compared to quality-equivalent dense models.  ...  Its training cost saving is demonstrated from encoder-decoder models (prior works) to a 5x saving for auto-aggressive language models (this work along with parallel explorations).  ...  Acknowledgment We thank Olatunji Ruwase from the Microsoft DeepSpeed Team for his contributions on developing, debugging, testing, and releasing the DeepSpeed-MoE software.  ... 
arXiv:2201.05596v2 fatcat:y5v2jx7y4fdxnlcl5bgpeuytei

Multilingual Training and Adaptation in Speech Recognition

Sibo Tong
2020
State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech  ...  The goal of this thesis is to improve current state-of-the-art acoustic modeling techniques in general for ASR, with a particular focus on multilingual ASR and cross-lingual adaptation.  ...  Yi et al. [2016] used phoneme labels for training a multi-accent CTC-based ASR system in a multitask setting.  ... 
doi:10.5075/epfl-thesis-7896 fatcat:xjknfsb63fho5drspzdxxpcaqq

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale [article]

Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He
2022
DeepSpeed-MoE offers an unprecedented scale and efficiency to serve massive MoE models with up to 4.5x faster and 9x cheaper inference compared to quality-equivalent dense models.  ...  Its training cost saving is demonstrated from encoder-decoder models (prior works) to a 5x saving for auto-aggressive language models (this work along with parallel explorations).  ...  Acknowledgment We thank Olatunji Ruwase from the Microsoft DeepSpeed Team for his contributions on developing, debugging, testing, and releasing the DeepSpeed-MoE software.  ... 
doi:10.48550/arxiv.2201.05596 fatcat:zipaeyqrkravxlexerkrxpjn4u

Building Digital Skills at Primary School – Sormative Regulation and Pedagogical Practice in Bulgaria
Формиране на дигитални умения в начална училищна възраст – нормативна база и практика в образователната система в България

Rumyana Papancheva, University "Prof. Dr Asen Zlatarov", Burgas, Rumyana Karadimitrova, Kosta Garov, Plovdiv University "Paisii Hilendarski", Plovdiv, Plovdiv University "Paisii Hilendarski", Plovdiv
2018 Education and Technologies Journal  
all, to the principals and teachers of the primary schools we have been working with for their excellent innovative work with and for their pupils -and also to the pupils themselves for the magnificent  ...  for excellent continuous support and cooperation.  ...  For example, the Ministry of Education (MOE) of Singapore has launched various prevention and intervention programs including training teachers, training student ambassadors, and involving parents (MOE  ... 
doi:10.26883/2010.181.834 fatcat:i7knugtx7zdxrmgr4llgh6t4iq

TeaP 2020 - Abstracts of the 62nd Conference of Experimental Psychologists

(:Unkn) Unknown, Leibniz Institut Für Psychologie (ZPID), Christian Dobel, Carina Giesen, Laura Anne Grigutsch, Jürgen M. Kaufmann, Gyula Kovács, Franziska Meissner, Klaus Rothermund, Stefan R. Schweinberger
2021
Contains Keynote Lectures, Contributions and Author Index of the 62nd Conference of Experimental Psychologists  ...  However, traditional (automotive and office) user interfaces were not optimized for this purpose, and thus, may be preventing effective productivity and even present a safety risk in conditional automation  ...  A special opportunity for automated individual mobility lies in the possibility to perform office work during traveling and commuting.  ... 
doi:10.23668/psycharchives.5176 fatcat:67c63hw2bnal5bnfydwg5nvuta