50 Hits in 7.7 sec

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models [article]

Shiqing Fan, Yi Rong, Chen Meng, Zongyan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, Lixue Xia, Lansong Diao, Xiaoyong Liu (+1 others)
2020 arXiv   pre-print
We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models.  ...  Recently, pipelined training has been proposed as an effective approach for improving device utilization.  ...  Data parallelism, model parallelism and pipeline parallelism are common approaches for distributed training of DNN models. Data Parallelism [43] .  ... 
arXiv:2007.01045v1 fatcat:w7qezkffyvcv7np26fft56hld4

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines [article]

Shigang Li, Torsten Hoefler
2021 arXiv   pre-print
This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models.  ...  Training large deep learning models at scale is very challenging.  ...  We also thank the Swiss National Supercomputing Center for providing the computing resources and excellent technical support.  ... 
arXiv:2107.06925v1 fatcat:nu5gu627lfgcrbrxsylztnboz4

Varuna: Scalable, Low-cost Training of Massive Deep Learning Models [article]

Sanjith Athlur, Nitika Saran, Muthian Sivathanu, Ramachandran Ramjee, Nipun Kwatra
2021 arXiv   pre-print
Varuna improves end-to-end training time by up to 18x compared to other model-parallel approaches and up to 26% compared to other pipeline parallel approaches.  ...  We demonstrate the efficacy of Varuna by training massive models, including a 200 billion parameter model, on 5x cheaper "spot VMs", while maintaining high training throughput.  ...  This is infeasible for large models that require long pipelines. The largest model that DAPPLE shows performance speedup for is Bert-48 with 600M parameters. Intra-layer parallelism.  ... 
arXiv:2111.04007v2 fatcat:mmmm5shpq5ey5fgfxklnrouoqe

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models [article]

Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica
2021 arXiv   pre-print
Model parallelism has become a necessity for training modern large-scale deep language models.  ...  In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based  ...  Acknowledgement We thank our anonymous reviewers for their insightful feedback. We also thank Lianmin Zheng and many others at the UC Berkeley RISELab for their helpful discussion and comments.  ... 
arXiv:2102.07988v2 fatcat:tfzfivgpwnhpdhxfq5r45aiiya

Scheduling Optimization Techniques for Neural Network Training [article]

Hyungjun Oh, Hyungjun Oh, HyeongJu Kim, Jiwon Seo
2021 arXiv   pre-print
Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration.  ...  systems, the throughput is substantially improved for single-GPU, data-parallel, and pipeline-parallel training.  ...  For a subset of the experiments, we evaluated Dapple [14] , a state of the art data-and pipeline-parallel training system.  ... 
arXiv:2110.00929v1 fatcat:2ojird3n45fznbitu36fsaxydy

Out-of-order backprop

Hyungjun Oh, Junyeol Lee, Hyeongju Kim, Jiwon Seo
2022 Proceedings of the Seventeenth European Conference on Computer Systems  
-1.99× for pipeline-parallel training.  ...  Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. While they improve the performance, GPUs are underutilized during the training.  ...  Acknowledgement We thank Gunjoo Ahn for the preliminary experiments and anonymous reviewers for their feedback.  ... 
doi:10.1145/3492321.3519563 fatcat:lcuuskkkorg43a2van4od7c6sq

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM [article]

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia
2021 arXiv   pre-print
We quantitatively study the trade-offs between tensor, pipeline, and data parallelism, and provide intuition as to how to configure distributed training of a large model.  ...  We survey techniques for pipeline parallelism and propose a novel interleaved pipeline parallelism schedule that can improve throughput by 10+% with memory footprint comparable to existing approaches.  ...  ACKNOWLEDGEMENTS We thank the anonymous reviewers, Seonmyeong Bak, Keshav Santhanam, Trevor Gale, Dimitrios Vytiniotis, and Siddharth Karamcheti for their help and feedback that improved this work.  ... 
arXiv:2104.04473v5 fatcat:copfgbd5zfao5po4ujiea6okxi

Memory-Efficient Pipeline-Parallel DNN Training [article]

Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia
2021 arXiv   pre-print
However, parameters and activations for such large models often do not fit in the memory of a single accelerator device; this means that it is necessary to distribute training of large models over multiple  ...  similar to data parallelism.  ...  We thank MSR for their generous support of Deepak's internship, and for resources to develop and evaluate PipeDream-2BW.  ... 
arXiv:2006.09503v3 fatcat:xtvlmojpzvfuleillhyb4khvpy

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning [article]

Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica
2022 arXiv   pre-print
Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism.  ...  Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans.  ...  More generally, efficient large-scale model training requires tuning a complex combination of data, operator, and pipeline parallelization approaches at the granularity of the individual operators in the  ... 
arXiv:2201.12023v1 fatcat:2fmvip46uzcktg6727taircmya

FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices [article]

Yuhao Chen, Qianqian Yang, Shibo He, Zhiguo Shi, Jiming Chen
2021 arXiv   pre-print
In this paper, we propose FTPipeHD, a novel DNN training framework that trains DNN models across distributed heterogeneous devices with fault tolerance mechanism.  ...  We also propose a novel weight redistribution approach that replicates the weights to both the neighboring nodes and the central node periodically, which combats the failure of multiple devices during  ...  Pipeline parallelism combines the idea of data parallelism with model parallelism. It further speeds up training by a pipeline mechanism to reduce the idle time that each worker follows.  ... 
arXiv:2110.02781v1 fatcat:km5jazyu3rde7f7sx6uwjgzcey

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments [article]

Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, Dejing Dou
2022 arXiv   pre-print
The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive.  ...  To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed  ...  Dapple: a tation ({OSDI}), pp. 463–479, 2020c. pipelined data parallel approach for training large models. In ACM SIGPLAN Symposium on Principles and Practice Kim, S., Yu, G.  ... 
arXiv:2111.10635v2 fatcat:bhfnem6gqzetnffma4sihfqbsa

MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud [article]

Zhen Zhang, Shuai Zheng, Yida Wang, Justin Chiu, George Karypis, Trishul Chilimbi, Mu Li, Xin Jin
2022 arXiv   pre-print
Existing general purpose frameworks for gigantic model training, i.e., models with billions to trillions of parameters, cannot scale efficiently on public cloud environments due to large communication  ...  Our evaluation on AWS shows that the system throughput of MiCS is up to 2.89× that of the state-of-the-art large model training systems.  ...  ACKNOWLEDGMENTS We thank the Amazon Search M5 team for providing large clusters for the experiments.  ... 
arXiv:2205.00119v3 fatcat:7cpzhzjbjraaxe46zvpmrc4nyi

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data

Jennifer D. Hintzsche, William A. Robinson, Aik Choon Tan
2016 International Journal of Genomics  
Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases.  ...  Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data.  ...  Acknowledgments The authors would like to thank the Translational Bioinformatics and Cancer Systems Biology Lab members for their constructive comments on this manuscript.  ... 
doi:10.1155/2016/7983236 pmid:28070503 pmcid:PMC5192301 fatcat:qtlmgypwxjbv7ewg5wskhqxs4a

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability  ...  DApple [264] hybrids pipeline parallelism with data parallelism in a flexible way to scale pipeline parallelism to more nodes. Pipeline parallelism can also be applied on different scales.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Computational solutions for omics data

Bonnie Berger, Jian Peng, Mona Singh
2013 Nature reviews genetics  
Finally, efficient means for storing, searching and retrieving data are of foremost concern as they are necessary for any analysis to proceed.  ...  This trend towards the democratization of genome-scale technologies means that large data sets are being generated and used by individual bench biologists.  ...  Cowen for valuable feedback. B.B. thanks the US National Institutes of Health (NIH) for grant GM081871.  ... 
doi:10.1038/nrg3433 pmid:23594911 pmcid:PMC3966295 fatcat:b7n6xwzyc5gqzo7plgyoe257iq
« Previous Showing results 1 — 15 out of 50 results