D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization

Jean Lafond, Hoi-To Wai, Eric Moulines
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Consider optimization problems of the form: fs : R n → R -strongly convex, continuously differentiable objective fct. of agent s. T -number of agents cooperating, moderately sized, T ≈ 10 to 100. n -dimension of parameter to be estimated, n ≈ 10 4 to 10 6 0. Optimal solution to (1) is sparse, θ 0 n. Applications: sparse recovery, high-dimensional regression, etc. This work: distributed, computation & communication efficient algorithms for (1). convergence rate analysis of the proposed
more » ... . D-FW: Communication Efficient Distributed Algorithms 3 / 20 While θ is sparse, intermediate iterates θ s t in D-PG is not sparse! Per-iteration communication cost for D-PG (and its variants) is high. Related works for different types of problems [JST + 14, BLG + 14]. D-FW: Communication Efficient Distributed Algorithms 4 / 20 Agenda 1 Frank-Wolfe algorithm Recent results on stochastic FW 2 Distributed FW algorithms for sparse optimization DistFW algorithm for star networks DeFW algorithm for general networks Convergence Analysis 3 Numerical Experiment 4 Conclusions & Future Work D-FW: Communication Efficient Distributed Algorithms 5 / 20 Frank-Wolfe (FW) algorithm (a.k.a. conditional gradient, projection-free optimization, etc.) A classical, first order algorithm with recent interests [FW56]. Applications in machine learning and solving high-dimensional problems, e.g., matrix completion, sparse optimization [Jag13]. Believed to be slow with sublinear convergence O(1/t) [CC68]. Recent results demonstrated cases where linear convergence rate O((1 − ρ) t ) can be achieved [LJJ13]. Analysis of its stochastic variants [LWM15, LZ14].
doi:10.1109/icassp.2016.7472457 dblp:conf/icassp/LafondWM16 fatcat:2ddxosmbhnhspg3xmzkxzb3p3y