Filters








12,456 Hits in 4.7 sec

A Classification of Weak Asynchronous Models of Distributed Computing [article]

Javier Esparza, Fabian Reiter
2020 arXiv   pre-print
We conduct a systematic study of asynchronous models of distributed computing consisting of identical finite-state devices that cooperate in a network to decide if the network satisfies a given graph-theoretical  ...  The classification is the consequence of several equi-expressivity results with a clear interpretation.  ...  Conclusions We have conducted an extensive comparative analysis of the expressive power of weak asynchronous models of distributed computing.  ... 
arXiv:2007.03291v1 fatcat:asvs45ozlbdpvgwt55nahu4dpa

A Classification of Weak Asynchronous Models of Distributed Computing

Javier Esparza, Fabian Reiter, Laura Kovács, Igor Konnov
2020 International Conference on Concurrency Theory  
We conduct a systematic study of asynchronous models of distributed computing consisting of identical finite-state devices that cooperate in a network to decide if the network satisfies a given graph-theoretical  ...  The classification is the consequence of several equi-expressivity results with a clear interpretation.  ...  Weak Asynchronous Models of Distributed Computing even number of leaves.  ... 
doi:10.4230/lipics.concur.2020.10 dblp:conf/concur/EsparzaR20 fatcat:i7nsnz6ew5dllfqtr5rsj4lc3e

Combining Aspects of Reactive Systems [chapter]

Leonid Kof, Bernhard Schätz
2004 Lecture Notes in Computer Science  
For reactive systems, a large collection of formal models has been developed.  ...  We classify and compare different specification methods for distributed systems concerning communication, behavior, and causality.  ...  The ordering within the domains helps to support the relation between different models arranged in a development process.  ... 
doi:10.1007/978-3-540-39866-0_34 fatcat:i4y2qepfubggtmf2ihrdw2gw3a

Page 5367 of Mathematical Reviews Vol. , Issue 94i [page]

1994 Mathematical Reviews  
Summary: “Event structures are a poset-based model for describ- ing the behaviour of distributed systems. They give rise to a well-understood class of Scott domains.  ...  As an example, the convergence of parallel asynchronous iterations of program flow analysis problems is shown.” 94i:68087 68Q10 68Q05 Wiedermann, Juraj [Wiedermann, Jiri] Weak parallel machines: a new  ... 

Byzantine Fault Tolerance in Distributed Machine Learning : a Survey [article]

Djamila Bouhata, Hamouma Moumen
2022 arXiv   pre-print
We offer an illustrative description of techniques used in BFT in DML, with a proposed classification of BFTs approaches in the context of their basic techniques.  ...  Byzantine Fault Tolerance (BFT) is among the most challenging problems in Distributed Machine Learning (DML).  ...  The gradients transfer and updated model are occurred synchronously in synchronous distributed SGD (right) and asynchronously in asynchronous distributed SGD (left).  ... 
arXiv:2205.02572v1 fatcat:h2hkcgz3w5cvrnro6whl2rpvby

Querying Asynchronously Updated Sensor Data Sets under Quantified Constraints [chapter]

Lutz Schlesinger, Wolfgang Lehner
2004 GeoSensor Networks  
Grid Classification Grid Data Distribution Data Joining -Pull together all these "distributed-ly owned" resources to act like a genie = grid The network is the computer.  ...  • Every organization has lots of computing power and storage • A personal computer nowadays would be a supercomputer in 1970.  ... 
doi:10.1201/9780203356869.ch2 fatcat:6t6ivxrq6ze7zjoqy5bybesy7a

GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent [article]

Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay Amatya
2018 arXiv   pre-print
The salient features of GossipGraD are: 1) reduction in overall communication complexity from Θ(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange  ...  their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during  ...  where the model has memorized the training set), we present asynchronous distributed memory shuffle of samples.  ... 
arXiv:1803.05880v1 fatcat:tun5qumqbvbjhay4q2dwbzdyxi

Scaling a Convolutional Neural Network for Classification of Adjective Noun Pairs with TensorFlow on GPU Clusters

Victor Campos, Francesc Sastre, Maurici Yagues, Jordi Torres, Xavier Giro-I-Nieto
2017 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)  
Second, the impact of distributed training methods on the training times and final accuracy of the models is studied.  ...  In this work, we present how the training of a deep neural network can be parallelized on a distributed GPU cluster.  ...  (2014-SGR-1051 and 2014-SGR-1421 ) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European  ... 
doi:10.1109/ccgrid.2017.110 dblp:conf/ccgrid/CamposSYTN17 fatcat:gubdrrjv5bcvdhhb3o4toocjne

The power of SIMDs vs. MIMDs in real-time scheduling

Mingxian Jin, J.W. Baker, W.C. Meilander
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
SIMDs and MIMDs are the most important categories of computer systems for parallel computing in Flynn's classification scheme.  ...  Two abstract parallel computation models, the ASC and BSP models that represent SIMDs and MIMDs respectively, are used in our discussion and analysis.  ...  SIMDs and MIMDs are the most important categories in Flynn's classification of computer. Most early parallel computers had a SIMD-style design.  ... 
doi:10.1109/ipdps.2002.1016671 dblp:conf/ipps/JinBM02 fatcat:zbx222oicveuza4r3gftnfcinq

TF-Replicator: Distributed Machine Learning for Researchers [article]

Peter Buchlovsky, David Budden, Dominik Grewe, Chris Jones, John Aslanides, Frederic Besse, Andy Brock, Aidan Clark, Sergio Gómez Colmenarejo, Aedan Pope, Fabio Viola, Dan Belov
2019 arXiv   pre-print
To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet  ...  Our results show strong scalability performance without demanding any distributed systems expertise of the user.  ...  Graph Replicas TensorFlow provides a versatile platform for distributed computing with a high degree of freedom, but little abstraction.  ... 
arXiv:1902.00465v1 fatcat:2ihyygokh5c2foqyxyxcxqjhia

Revisiting Distributed Synchronous SGD [article]

Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz
2017 arXiv   pre-print
Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise  ...  We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches.  ...  In this work, we have shown how both synchronous and asynchronous distributed stochastic optimization suffer from their respective weaknesses of stragglers and staleness.  ... 
arXiv:1604.00981v3 fatcat:fnfrhsyakjfxxho4f3s2rwnurq

Revisiting Distributed Synchronous SGD [article]

Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
2017 arXiv   pre-print
Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise  ...  We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches.  ...  In this work, we have shown how both synchronous and asynchronous distributed stochastic optimization suffer from their respective weaknesses of stragglers and staleness.  ... 
arXiv:1702.05800v2 fatcat:s2yrnfe7cneetib6rbak25slii

A Comprehensive Study on Failure Detectors of Distributed Systems

Bhavana Chaurasia, Anshul Verma
2020 Journal of scientific research  
The paper helps readers for the enhancement of knowledge about the basics of failure detectors and the different algorithms which are developed to solve the failure detection problems of distributed systems  ...  In distributed systems, failure detectors are used to monitor the processes and to reduce the risk of failures by detecting them before system crashes.  ...  A distributed system has a set of processes coordinate with each other through message passing. Asynchronous and failures are the fundamental issues of distributed computing (Raynal, 2016) .  ... 
doi:10.37398/jsr.2020.640235 fatcat:znckxyrnnnf3npesjkjtifjde4

Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster

Víctor Campos, Francesc Sastre, Maurici Yagües, Míriam Bellver, Xavier Giró-i-Nieto, Jordi Torres
2017 Procedia Computer Science  
In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster.  ...  In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster.  ...  (2014-SGR-1051 and 2014-SGR-1421) of the Catalan Government and by the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economia y Competitividad and the European  ... 
doi:10.1016/j.procs.2017.05.074 fatcat:zn25suw4nzculcfjvo22fl2sti

Asynchronous, Data-Parallel Deep Convolutional Neural Network Training with Linear Prediction Model for Parameter Transition [chapter]

Ikuro Sato, Ryo Fujisaki, Yosuke Oyama, Akihiro Nomura, Satoshi Matsuoka
2017 Lecture Notes in Computer Science  
Asynchronous Stochastic Gradient Descent provides a possibility of largescale distributed computation for training such networks.  ...  The experimental results on ImageNet demonstrates that the proposed asynchronous training method, compared to a synchronous training method, reduces the training time to reach a certain model accuracy  ...  [8] proposed a delay compensation technique for asynchronous, distributed deep learning.  ... 
doi:10.1007/978-3-319-70096-0_32 fatcat:njad52dzvbakxjwasyzkvp5zny
« Previous Showing results 1 — 15 out of 12,456 results