A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
[article]
2020
arXiv
pre-print
We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. ...
Recently, pipelined training has been proposed as an effective approach for improving device utilization. ...
Data parallelism, model parallelism and pipeline parallelism are common approaches for distributed training of DNN models. Data Parallelism [43] . ...
arXiv:2007.01045v1
fatcat:w7qezkffyvcv7np26fft56hld4
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
[article]
2021
arXiv
pre-print
This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. ...
Training large deep learning models at scale is very challenging. ...
We also thank the Swiss National Supercomputing Center for providing the computing resources and excellent technical support. ...
arXiv:2107.06925v1
fatcat:nu5gu627lfgcrbrxsylztnboz4
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
[article]
2021
arXiv
pre-print
Varuna improves end-to-end training time by up to 18x compared to other model-parallel approaches and up to 26% compared to other pipeline parallel approaches. ...
We demonstrate the efficacy of Varuna by training massive models, including a 200 billion parameter model, on 5x cheaper "spot VMs", while maintaining high training throughput. ...
This is infeasible for large models that require long pipelines. The largest model that DAPPLE shows performance speedup for is Bert-48 with 600M parameters. Intra-layer parallelism. ...
arXiv:2111.04007v2
fatcat:mmmm5shpq5ey5fgfxklnrouoqe
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
[article]
2021
arXiv
pre-print
Model parallelism has become a necessity for training modern large-scale deep language models. ...
In this work, we identify a new and orthogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based ...
Acknowledgement We thank our anonymous reviewers for their insightful feedback. We also thank Lianmin Zheng and many others at the UC Berkeley RISELab for their helpful discussion and comments. ...
arXiv:2102.07988v2
fatcat:tfzfivgpwnhpdhxfq5r45aiiya
Scheduling Optimization Techniques for Neural Network Training
[article]
2021
arXiv
pre-print
Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. ...
systems, the throughput is substantially improved for single-GPU, data-parallel, and pipeline-parallel training. ...
For a subset of the experiments, we evaluated Dapple [14] , a state of the art data-and pipeline-parallel training system. ...
arXiv:2110.00929v1
fatcat:2ojird3n45fznbitu36fsaxydy
Out-of-order backprop
2022
Proceedings of the Seventeenth European Conference on Computer Systems
-1.99× for pipeline-parallel training. ...
Neural network training requires a large amount of computation and thus GPUs are often used for the acceleration. While they improve the performance, GPUs are underutilized during the training. ...
Acknowledgement We thank Gunjoo Ahn for the preliminary experiments and anonymous reviewers for their feedback. ...
doi:10.1145/3492321.3519563
fatcat:lcuuskkkorg43a2van4od7c6sq
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
[article]
2021
arXiv
pre-print
We quantitatively study the trade-offs between tensor, pipeline, and data parallelism, and provide intuition as to how to configure distributed training of a large model. ...
We survey techniques for pipeline parallelism and propose a novel interleaved pipeline parallelism schedule that can improve throughput by 10+% with memory footprint comparable to existing approaches. ...
ACKNOWLEDGEMENTS We thank the anonymous reviewers, Seonmyeong Bak, Keshav Santhanam, Trevor Gale, Dimitrios Vytiniotis, and Siddharth Karamcheti for their help and feedback that improved this work. ...
arXiv:2104.04473v5
fatcat:copfgbd5zfao5po4ujiea6okxi
Memory-Efficient Pipeline-Parallel DNN Training
[article]
2021
arXiv
pre-print
However, parameters and activations for such large models often do not fit in the memory of a single accelerator device; this means that it is necessary to distribute training of large models over multiple ...
similar to data parallelism. ...
We thank MSR for their generous support of Deepak's internship, and for resources to develop and evaluate PipeDream-2BW. ...
arXiv:2006.09503v3
fatcat:xtvlmojpzvfuleillhyb4khvpy
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
[article]
2022
arXiv
pre-print
Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. ...
Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. ...
More generally, efficient large-scale model training requires tuning a complex combination of data, operator, and pipeline parallelization approaches at the granularity of the individual operators in the ...
arXiv:2201.12023v1
fatcat:2fmvip46uzcktg6727taircmya
FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices
[article]
2021
arXiv
pre-print
In this paper, we propose FTPipeHD, a novel DNN training framework that trains DNN models across distributed heterogeneous devices with fault tolerance mechanism. ...
We also propose a novel weight redistribution approach that replicates the weights to both the neighboring nodes and the central node periodically, which combats the failure of multiple devices during ...
Pipeline parallelism combines the idea of data parallelism with model parallelism. It further speeds up training by a pipeline mechanism to reduce the idle time that each worker follows. ...
arXiv:2110.02781v1
fatcat:km5jazyu3rde7f7sx6uwjgzcey
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
[article]
2022
arXiv
pre-print
The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. ...
To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed ...
Dapple: a tation ({OSDI}), pp. 463–479, 2020c.
pipelined data parallel approach for training large models.
In ACM SIGPLAN Symposium on Principles and Practice Kim, S., Yu, G. ...
arXiv:2111.10635v2
fatcat:bhfnem6gqzetnffma4sihfqbsa
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
[article]
2022
arXiv
pre-print
Existing general purpose frameworks for gigantic model training, i.e., models with billions to trillions of parameters, cannot scale efficiently on public cloud environments due to large communication ...
Our evaluation on AWS shows that the system throughput of MiCS is up to 2.89× that of the state-of-the-art large model training systems. ...
ACKNOWLEDGMENTS We thank the Amazon Search M5 team for providing large clusters for the experiments. ...
arXiv:2205.00119v3
fatcat:7cpzhzjbjraaxe46zvpmrc4nyi
A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data
2016
International Journal of Genomics
Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases. ...
Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data. ...
Acknowledgments The authors would like to thank the Translational Bioinformatics and Cancer Systems Biology Lab members for their constructive comments on this manuscript. ...
doi:10.1155/2016/7983236
pmid:28070503
pmcid:PMC5192301
fatcat:qtlmgypwxjbv7ewg5wskhqxs4a
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. ...
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability ...
DApple [264] hybrids pipeline parallelism with data parallelism in a flexible way to scale pipeline parallelism to more nodes. Pipeline parallelism can also be applied on different scales. ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4
Computational solutions for omics data
2013
Nature reviews genetics
Finally, efficient means for storing, searching and retrieving data are of foremost concern as they are necessary for any analysis to proceed. ...
This trend towards the democratization of genome-scale technologies means that large data sets are being generated and used by individual bench biologists. ...
Cowen for valuable feedback. B.B. thanks the US National Institutes of Health (NIH) for grant GM081871. ...
doi:10.1038/nrg3433
pmid:23594911
pmcid:PMC3966295
fatcat:b7n6xwzyc5gqzo7plgyoe257iq
« Previous
Showing results 1 — 15 out of 50 results