Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning
2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
This paper discusses practical consensus-based distributed optimization algorithms. In consensus-based optimization algorithms, nodes interleave local gradient descent steps with consensus iterations. Gradient steps drive the solution to a minimizer, while the consensus iterations synchronize the values so that all nodes converge to a network-wide optimum when the objective is convex and separable. The consensus update requires communication. If communication is synchronous and nodes wait to
... nd nodes wait to receive one message from each of their neighbors before updating then progress is limited by the slowest node. To be robust to failing or stalling nodes, asynchronous communications should be used. Asynchronous protocols using bi-directional communications cause deadlock, and so one-directional protocols are necessary. However, with one-directional asynchronous protocols it is no longer possible to guarantee the consensus matrix is doubly stochastic. At the same time it is essential that the coordination protocol achieve consensus on the average to avoid biasing the optimization objective. We report on experiments running Push-Sum Distributed Dual Averaging for convex optimization in a MPI cluster. The experiments illustrate the benefits of using asynchronous consensus-based distributed optimization when some nodes are unreliable and may fail or when messages experience time-varying delays.