251,530 Hits in 2.4 sec

Efficient model partitioning for distributed model transformations

Amine Benelallam, Massimo Tisi, Jesús Sánchez Cuadrado, Juan de Lara, Jordi Cabot
2016 Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering - SLE 2016  
Moreover, we propose a data distribution algorithm for declarative model transformation based on static analysis of relational transformation rules.  ...  As the models that need to be handled in model-driven engineering grow in scale, scalable algorithms for model transformation (MT) are becoming necessary.  ...  A Greedy Model-Partitioning Algorithm for Distributed Transformations Although the footprints of the transformation help us approximate a transformation dependency graph, data distribution necessitates  ... 
doi:10.1145/2997364.2997385 fatcat:etriptln2fgv3nkjkfwq6jebm4

BigDL: A Distributed Deep Learning Framework for Big Data [article]

Jason Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang (+6 others)
2018 arXiv   pre-print
In this paper, we present BigDL, a distributed deep learning framework for Big Data platforms and workflows.  ...  as to achieve highly scalable, data-parallel distributed training.  ...  Model training The transformed input data (RDD of Samples) and the constructed model can then be passed over to the Optimizer in BigDL, which automatically performs distributed model training across the  ... 
arXiv:1804.05839v3 fatcat:u5afdn37l5c7lalqxqmlj5se6e

Machine-Learning Based Memory Prediction Model for Data Parallel Workloads in Apache Spark

Rohyoung Myung, Sukyong Choi
2021 Symmetry  
Then, we propose a machine-learning-based prediction model that determines the efficient memory for a given workload and data.  ...  The proposed model can improve memory efficiency up to 1.89 times compared with the vanilla Spark setting.  ...  environments in the generalpurpose distributed-processing Spark platform. • Based on the memory usage model, we propose the memory prediction model for estimating efficient amounts of memory of data-parallel  ... 
doi:10.3390/sym13040697 fatcat:wg75rx55jjhyzef3pnqhxqlyfa

An Efficient 2D Method for Training Super-Large Deep Learning Models [article]

Qifan Xu and Shenggui Li and Chaoyu Gong and Yang You
2021 arXiv   pre-print
In this work, we propose Optimus, a highly efficient and scalable 2D-partition paradigm of model parallelism that would facilitate the training of infinitely large language models.  ...  In Optimus, activations are partitioned and distributed among devices, further reducing redundancy. In terms of isoefficiency, Optimus significantly outperforms Megatron.  ...  Assume there are Transformer layers in the language model, each device has to accommodate a memory buffer of size ℎ/ for distributed activation checkpointing.  ... 
arXiv:2104.05343v1 fatcat:7vhetmiggjbb5ngc2yjsdtn2nm

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks [article]

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun
2019 arXiv   pre-print
Although current DL frameworks scale well for image classification models, there remain opportunities for scalable distributed training on natural language processing (NLP) models.  ...  DL frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist DL researchers to train their models in a distributed manner.  ...  Sparse Variable Partitioning This section presents the efficiency of sparse variable partitioning method of Parallax for LM and NMT.  ... 
arXiv:1808.02621v3 fatcat:flymv2t6lnh23pkzivelnjby44

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, Kunle Olukotun
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
To optimize distributed applications both for modern hardware and for modern programmers we need a programming model that is sufficiently expressive to support a variety of parallel applications, sufficiently  ...  We present experimental results for a range of applications spanning multiple domains and demonstrate highly efficient execution compared to manually-optimized counterparts in multiple distributed programming  ...  Acknowledgments We are grateful to the anonymous reviewers for their comments and suggestions.  ... 
doi:10.1145/2854038.2854042 dblp:conf/cgo/BrownLRSSAO16 fatcat:cye5j5gi3vfgzh7xyku5cfttq4

Data-Reuse and Parallel Embedded Architectures for Low-Power, Real-Time Multimedia Applications [chapter]

D. Soudris, N. D. Zervas, A. Argyriou, M. Dasygenis, K. Tatas, C. E. Goutis, A. Thanailakis
2000 Lecture Notes in Computer Science  
Experimental results prove that improvements in both power and performance can be acquired, when the right combination of data memory architecture model and data-reuse transformation is selected.  ...  Exploitation of data re-use in combination with the use of custom memory hierarchy that exploits the temporal locality of data accesses may introduce significant power savings, especially for dataintensive  ...  As it can be seen, Fig. 1 .Fig. 2 . 12 The distributed memory data-memory architecture model . . .  ... 
doi:10.1007/3-540-45373-3_26 fatcat:zejztgh2lvcr3pybnieo74iqoi

A Parallel Distributed Weka Framework for Big Data Mining Using Spark

Aris-Kyriakos Koliopoulos, Paraskevas Yiapanis, Firat Tekiner, Goran Nenadic, John Keane
2015 2015 IEEE International Congress on Big Data  
This work discusses DistributedWekaSpark, a distributed framework for Weka which maintains its existing user interface.  ...  The framework is implemented on top of Spark, a Hadoop-related distributed framework with fast in-memory processing capabilities and support for iterative computations.  ...  The authors wish to thank Dr Mark Hall at the University of Waikato for his advice and encouragement.  ... 
doi:10.1109/bigdatacongress.2015.12 dblp:conf/bigdata/KoliopoulosYTNK15 fatcat:cxo2lme5yresld6hpy4n5otyhq

Scaling Data-Intensive Applications on Heterogeneous Platforms with Accelerators

Ana Balevic, Bart Kienhuis
2012 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum  
Tiling + Streaming = TStream -Stage I: Compiler transforms for data partitioning -Tiling in polyhedral model -I/O tile bounds + footprint computation -Stage II: Support for tile streaming -Communication  ...  ( j = 0; j<N; j++ ) TStream: -Stage I: Transforms for data splitting -Tiling in polyhedral model -I/O tile bounds + footprint computation -Stage II: Support for tile streaming -Mapping for execution  ... 
doi:10.1109/ipdpsw.2012.230 dblp:conf/ipps/BalevicK12 fatcat:w46lyu4cf5gpfj6eeym5xhm57y

Graph Partitioning Algorithm for Social Network Model Transformation Frameworks

Gergely Mezei, László Deák, Krisztian Fekete, Tamás Vajk
2013 Proceedings of the 8th International Joint Conference on Software Technologies  
Based on the algorithm, models should be able to be mapped onto several computational instances and processed in a distributed fashion efficiently.  ...  We focus on creating an algorithm to partition graphs representing models.  ...  For extra-large models such as social networks, several instances of computers can be used and transformations can be applied efficiently in a distributed fashion.  ... 
doi:10.5220/0004475104800487 dblp:conf/icsoft/MezeiDFV13 fatcat:ylsfp5mpjnfjrmdajgjjzdzdj4

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding [article]

Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen
2020 arXiv   pre-print
Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation  ...  We demonstrate that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior  ...  Acknowledgements We would like to thank the Google Brain and Translate teams for their useful input and insightful discussions, entire XLA and Lingvo development teams for their foundational contributions  ... 
arXiv:2006.16668v1 fatcat:tucpisgorneq3gbikveukhqxri

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks [article]

Shengwei Li, Zhiquan Lai, Dongsheng Li, Xiangyu Ye, Yabo Duan
2021 arXiv   pre-print
Distributed data-parallel training has been widely used for natural language processing (NLP) neural network models.  ...  In this paper, we propose EmbRace, an efficient communication framework designed to accelerate sparse communication of distributed NLP model training.  ...  Conclusion In this paper, we present EmbRace, an efficient distributed sparse communication framework for NLP model training.  ... 
arXiv:2110.09132v1 fatcat:lc3fl6gq4zbwtpirnbyrw6y3du

Symbolic Important Point Perceptually and Hidden Markov Model Based Hydraulic Pump Fault Diagnosis Method

Yunzhao Jia, Minqiang Xu, Rixin Wang
2018 Sensors  
The PIP series is transformed into symbolic series that will serve as feature series for HMM, Genetic Algorithm is used to optimize the symbolic space partition scheme.  ...  The Hidden Markov Model is then employed for fault classification. An experiment involves four operating conditions is applied to validate the proposed method.  ...  Transforming PIP series into symbolic series, Genetic Algorithm is applied to optimize the partition scheme of symbolic space; 4.  ... 
doi:10.3390/s18124460 fatcat:wvnazwsdzfbknimgkdsojt3yom

2.5-dimensional distributed model training [article]

Boxiang Wang, Qifan Xu, Zhengda Bian, Yang You
2021 arXiv   pre-print
Optimus is a 2D solution for distributed tensor parallelism. However, these methods have a high communication overhead and a low scaling efficiency on large-scale computing clusters.  ...  Compared to previous 1D and 2D model parallelization of language models, our SUMMA2.5-LM managed to reduce the transmission cost on each layer, which could get a 1.45X efficiency according to our weak  ...  to C; return C SUMMA2.5 on language model In our work, we applied our SUMMA2.5 on Transformer.  ... 
arXiv:2105.14500v1 fatcat:rdg4d74pnbgcrnhhwdxdadz3we

Affine-transformation invariant clustering models

Hsin-Hsiung Huang, Jie Yang
2020 Journal of Statistical Distributions and Applications  
The proposed Metropolis-Hasting algorithm leads to an irreducible and aperiodic Markov chain, which is also efficient at identifying clusters reasonably well for various applications.  ...  corresponding to models I, II, and III, respectively and represent clusters with the proposed heatmap of the similarity matrix.  ...  In contrast, model II is less computationally expensive than model III, and model I is the most efficient one. (2020) 7:10 Page 8 of 24 Markov chain Monte Carlo algorithm for sampling partitions We use  ... 
doi:10.1186/s40488-020-00111-y fatcat:ibk55tmmi5gwvldex5ygmyjvjy
« Previous Showing results 1 — 15 out of 251,530 results