Filters








25,648 Hits in 14.5 sec

High-Performance Distributed ML at Scale through Parameter Server Consistency Models [article]

Wei Dai, Abhimanu Kumar, Jinliang Wei, Qirong Ho, Garth Gibson, Eric P. Xing
<span title="2014-10-29">2014</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
through relaxed "consistency models" that allow inconsistent parameter reads.  ...  The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput  ...  Consistency Models for Parameter Servers A key idea for large-scale distributed ML is to carefully trade off parameter consistency for increased parameter read throughput (and thus faster algorithm execution  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1410.8043v1">arXiv:1410.8043v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/gymqox7auzewxezrn7e64vdcsa">fatcat:gymqox7auzewxezrn7e64vdcsa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200928193948/https://arxiv.org/pdf/1410.8043v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c6/93/c6936cf52602d3afff46d23608e27fcefca12ec6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1410.8043v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

From Federated to Fog Learning: Distributed Machine Learning over Heterogeneous Wireless Networks [article]

Seyyedali Hosseinalipour and Christopher G. Brinton and Vaneet Aggarwal and Huaiyu Dai and Mung Chiang
<span title="2020-10-23">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This migrates from star network topologies used for parameter transfers in federated learning to more distributed topologies at scale.  ...  To address this, we advocate a new learning paradigm called fog learning which will intelligently distribute ML model training across the continuum of nodes from edge devices to cloud servers.  ...  CONCLUSION We introduced fog learning, a new paradigm for distributing ML model training through large-scale networks of heterogeneous devices.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.03594v3">arXiv:2006.03594v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mpcav4qexvgwdmnvvr4qzuiblm">fatcat:mpcav4qexvgwdmnvvr4qzuiblm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201029223427/https://arxiv.org/pdf/2006.03594v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ea/f6/eaf67d330de832680379abea2bf3bc4db9eae5a4.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2006.03594v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Performance Analysis and Comparison of Distributed Machine Learning Systems [article]

Salem Alqahtani, Murat Demirbas
<span title="2019-09-04">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In contrast, 1PS systems suffer from low performance due to network congestion at the parameter server side.  ...  three different system architectures: Parameter Server (PS), peer-to-peer (P2P), and Ring allreduce (RA).  ...  The performance limits in Apache Spark for distributed ML applications are scalability and compares with high performance computing MPI framework [35] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.02061v1">arXiv:1909.02061v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pmdz2nmufnaxbdtorhozsewbhi">fatcat:pmdz2nmufnaxbdtorhozsewbhi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200823085800/https://arxiv.org/pdf/1909.02061v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ad/2d/ad2d09320a94c9a8051ef0445890095863d4d1bc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.02061v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Managed communication and consistency for fast data-parallel iterative analytics

Jinliang Wei, Wei Dai, Aurick Qiao, Qirong Ho, Henggang Cui, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, Eric P. Xing
<span title="">2015</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/eitdfnn7k5fohgz7jhhim3f4bm" style="color: black;">Proceedings of the Sixth ACM Symposium on Cloud Computing - SoCC &#39;15</a> </i> &nbsp;
While data-parallel ML applications often employ a loose consistency model when updating shared model parameters to maximize parallelism, the accumulated error may seriously impact the quality of refinements  ...  At the core of Machine Learning (ML) analytics is often an expert-suggested model, whose parameters are refined by iteratively processing a training dataset until convergence.  ...  At the core of many important ML analytics is an expert-suggested model, whose parameters must be refined starting from an initial guess.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2806777.2806778">doi:10.1145/2806777.2806778</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cloud/WeiDQHCGGGX15.html">dblp:conf/cloud/WeiDQHCGGGX15</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mgqx7iwlare3tciivbyszgk5oq">fatcat:mgqx7iwlare3tciivbyszgk5oq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190603224425/http://www.cs.cmu.edu/~jinlianw/papers/socc15_wei.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2b/31/2b3113b7fda6414548e88fc664f3be96d5209830.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2806777.2806778"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

A Survey on Distributed Machine Learning

Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, Jan S. Rellermeyer
<span title="2020-03-13">2020</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/eiea26iqqjcatatlgxdpzt637y" style="color: black;">ACM Computing Surveys</a> </i> &nbsp;
Coates et al. [38] were able to train a 1B parameter network on their Commodity Off-The-Shelf High Performance Computing (COTS HPC) system in just three days.  ...  TPUs are attached to the server system through the PCI Express bus. This provides them with a direct connection with the CPU, which allows for a high aggregated bandwidth of 63 GB/s (PCI-e5x16).  ...  A major challenge of scaling out is that not all ML algorithms lend themselves to a distributed computing model, which can thus only be used for algorithms that can achieve a high degree of parallelism  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3377454">doi:10.1145/3377454</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/apwpdtza4zc2tcn37hnxxrb74u">fatcat:apwpdtza4zc2tcn37hnxxrb74u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210427084620/https://repository.tudelft.nl/islandora/object/uuid%3A64ca2c9c-72a3-4b8d-b060-727d6fd60163/datastream/OBJ/download" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/47/59/4759562b80be03957fe106ab2fbeaed6dfabb262.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3377454"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

A Survey on Distributed Machine Learning [article]

Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, Jan S. Rellermeyer
<span title="2019-12-20">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters.  ...  These distributed systems present new challenges, first and foremost the efficient parallelization of the training process and the creation of a coherent model.  ...  b adca (b) Decentralized (Tree) Trained model Parameter Ser er ML node ML node Parameter Ser er ML node ML node Data Comp te (c) Decentralized (Parameter Server) ML d ML d ML d  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.09789v1">arXiv:1912.09789v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kbjkeznysjaqtndgdubm52fxay">fatcat:kbjkeznysjaqtndgdubm52fxay</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321010834/https://arxiv.org/ftp/arxiv/papers/1912/1912.09789.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.09789v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications [article]

S. Hu, X. Chen, W. Ni, E. Hossain, X. Wang
<span title="2020-12-02">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Distributed machine learning (DML) techniques, such as federated learning, partitioned learning, and distributed reinforcement learning, have been increasingly applied to wireless communications.  ...  The unique features of wireless systems, such as large scale, geographically dispersed deployment, user mobility, and massive amount of data, give rise to new challenges in the design of DML techniques  ...  The central system follows the parameter server architecture, allowing developers to access the global ML model from every node through a simple interface to the distributed and shared memory of the system  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.01489v1">arXiv:2012.01489v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pdauhq4xbbepvf26clhpqnc2ci">fatcat:pdauhq4xbbepvf26clhpqnc2ci</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201208055121/https://arxiv.org/pdf/2012.01489v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a3/18/a318e9e21595bbcbc21c5bde9c177734c4b36544.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.01489v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

MLitB: Machine Learning in the Browser [article]

Edward Meeds and Remco Hendriks and Said Al Faraby and Magiel Bruntink and Max Welling
<span title="2015-06-17">2015</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This paper introduces MLitB, a prototype ML framework written entirely in JavaScript, capable of performing large-scale distributed computing with heterogeneous classes of devices.  ...  learning and prediction to the public at large.  ...  Other distributed ML algorithm research includes the parameter server model [37] , parallelized SGD [38] , and distributed SGD [39] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1412.2432v2">arXiv:1412.2432v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jkfuucxdurhjlfvoizrjnstsdq">fatcat:jkfuucxdurhjlfvoizrjnstsdq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906023104/https://arxiv.org/pdf/1412.2432v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2a/82/2a8268d4e158583ace796275cf9352bb7b5195ea.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1412.2432v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Democratizing Production-Scale Distributed Deep Learning [article]

Minghuang Ma, Hadi Pouransari, Daniel Chao, Saurabh Adya, Santiago Akle Serrano, Yi Qin, Dan Gimnicher, Dominic Walsh
<span title="2018-11-03">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
However, training them distributed and at scale remains difficult due to the complex ecosystem of tools and hardware involved.  ...  To address these restrictions, we introduce Alchemist - an internal service built at Apple from the ground up for easy, fast, and scalable distributed training.  ...  In most cases, Horovod scales better than parameter server and it scales consistently well across different number of GPUs and different models.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.00143v2">arXiv:1811.00143v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yc3kfouerbamtbhqtlzi2uf2im">fatcat:yc3kfouerbamtbhqtlzi2uf2im</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200907020920/https://arxiv.org/pdf/1811.00143v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3e/41/3e41e1a8f864dc3a6b74c3d15f74956df9869a06.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.00143v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Angel: a new large-scale machine learning system

Jie Jiang, Lele Yu, Jiawei Jiang, Yuhong Liu, Bin Cui
<span title="2017-02-24">2017</span> <i title="Oxford University Press (OUP)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/u4bdhpukkbe3ncikyibr6q4x7e" style="color: black;">National Science Review</a> </i> &nbsp;
With the increasing volume of data, large-scale ML applications require an efficient implementation to accelerate the performance.  ...  Existing systems parallelize algorithms through either data parallelism or model parallelism.  ...  CONCLUSION In this paper, we proposed a new general-purpose distributed ML system, named Angel, which aimed at solving large-scale ML problems faced by big data analytic applications.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/nsr/nwx018">doi:10.1093/nsr/nwx018</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qtp4yspgizbxbo36p2milrhxoe">fatcat:qtp4yspgizbxbo36p2milrhxoe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20180729180136/https://watermark.silverchair.com/nwx018.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAZswggGXBgkqhkiG9w0BBwagggGIMIIBhAIBADCCAX0GCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQM7IGmlyqhdmTlMr4iAgEQgIIBTvkt8el8-GFUGcDOEkAYhRKHOp0YlX4FEYXvCSKAQwekwH0lB2fbfBDryOv8kqRKD7seiGso6LT_US_qFBEUc_926XFBp3D23t0Cl9pbJ8bMPa1RgqQcrdKbaxHnI78udPF8VOTkm8gwVhjPCOogLzbTDPYbpfv1tqHdwEK76TLvqEhM5tK0k7GLfJWSfLEuq59voE-YhbuUacUKP2CyzODrmuy7V-mZrhiX60YZYJsid37sYnls-tHVtrmx8uBuei6GwSqfFzMbQKCIgh_qkNcg4MVygPD7gjdWti7ic8PiBVtyzGN-7xnG5jU-8MQMkVPbCWjEyWgm7NsLd6rLNXVwAhqWpAEN7eJZFLotjsqdHnh5CsnVNKo4RLWpDm3bEeTy1WpSMQJk2yrY3hLTK845YV3jMlgUM-0cQ3kHOYiJ-eUcm2g5_Lu7pFCdzTQ" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/66/20/662011b7e853c927e9d9cb2fc26008cfea58e666.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1093/nsr/nwx018"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> oup.com </button> </a>

MLitB: machine learning in the browser

Edward Meeds, Remco Hendriks, Said Al Faraby, Magiel Bruntink, Max Welling
<span title="2015-07-29">2015</span> <i title="PeerJ"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zs2czkfyybggbpvr26rbyxpsjy" style="color: black;">PeerJ Computer Science</a> </i> &nbsp;
This paper introduces MLitB, a prototype ML framework written entirely in Javascript, capable of performing large-scale distributed computing with heterogeneous classes of devices.  ...  learning and prediction to the public at large.  ...  Other distributed ML algorithm research includes the parameter server model , parallelized SGD (Zinkevich et al., 2010) , and distributed SGD (Ahn, Shahbaba & Welling, 2014) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7717/peerj-cs.11">doi:10.7717/peerj-cs.11</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/73ajvrrgxfg35d3zi6of5wprj4">fatcat:73ajvrrgxfg35d3zi6of5wprj4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20171202055539/https://peerj.com/articles/cs-11.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/61/c2/61c23ee060cbc41986b1c570d7eff5c927d5eebd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.7717/peerj-cs.11"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Petuum: A New Platform for Distributed Machine Learning on Big Data

Eric P. Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, Yaoliang Yu
<span title="2015-06-01">2015</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zfau6furxjhwdi5ecdqcdfbrbq" style="color: black;">IEEE Transactions on Big Data</a> </i> &nbsp;
What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100 s of billions of parameters) on Big Data (up to terabytes  ...  We propose a general-purpose framework, Petuum, that systematically addresses data-and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric  ...  ESSP consistency model, used by the Parameter Server. Workers are allowed to run at different speeds, but are prevented from being more than s iterations apart.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tbdata.2015.2472014">doi:10.1109/tbdata.2015.2472014</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/lncatglg6nemdlgm34aktpr3om">fatcat:lncatglg6nemdlgm34aktpr3om</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20161020021944/http://www.cs.cmu.edu:80/~yaoliang/mypapers/petuum.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/da/5f/da5f41b8b226d2eaec4c772b87faf64ae24c0562.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tbdata.2015.2472014"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Petuum

Eric P. Xing, Yaoliang Yu, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar
<span title="">2015</span> <i title="ACM Press"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/fqqihtxlu5bvfaqxjyvqcob35a" style="color: black;">Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD &#39;15</a> </i> &nbsp;
What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100 s of billions of parameters) on Big Data (up to terabytes  ...  We propose a general-purpose framework, Petuum, that systematically addresses data-and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric  ...  ESSP consistency model, used by the Parameter Server. Workers are allowed to run at different speeds, but are prevented from being more than s iterations apart.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2783258.2783323">doi:10.1145/2783258.2783323</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/kdd/XingHDKWLZXKY15.html">dblp:conf/kdd/XingHDKWLZXKY15</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yjs35ofs6ne6pcfkt3bgnnb7h4">fatcat:yjs35ofs6ne6pcfkt3bgnnb7h4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20161020021944/http://www.cs.cmu.edu:80/~yaoliang/mypapers/petuum.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/da/5f/da5f41b8b226d2eaec4c772b87faf64ae24c0562.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2783258.2783323"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Dynamic Parameter Allocation in Parameter Servers [article]

Alexander Renz-Wieland, Rainer Gemulla, Steffen Zeuch, Volker Markl
<span title="2020-05-12">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We found that Lapse provides near linear scaling and can be orders of magnitude faster than existing parameter servers.  ...  We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to  ...  In distributed ML, both training data and model parameters are partitioned across a compute cluster.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.00655v2">arXiv:2002.00655v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/i537kujvmbhv5paucqa2oks5p4">fatcat:i537kujvmbhv5paucqa2oks5p4</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200527203106/https://arxiv.org/pdf/2002.00655v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.00655v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Consistent Bounded-Asynchronous Parameter Servers for Distributed ML [article]

Jinliang Wei, Wei Dai, Abhimanu Kumar, Xun Zheng, Qirong Ho, Eric P. Xing
<span title="2013-12-31">2013</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The proposed consistency models are implemented in a distributed parameter server and evaluated in the context of a popular ML application: topic modeling.  ...  This property allows distributed ML to relax strict consistency models to improve system performance while theoretically guarantees algorithmic correctness.  ...  Consistency models employed in modern distributed ML system tend to fall into two extremes: either sequential consitency or no consistency guarantee at all.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1312.7869v2">arXiv:1312.7869v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2af2ztw4mneibbud2wk2u6pdca">fatcat:2af2ztw4mneibbud2wk2u6pdca</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191019202537/https://arxiv.org/pdf/1312.7869v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/be/8e/be8e647ade4da68acb8e06c3272aaace1e4bb7bc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1312.7869v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 25,648 results