A Note on the Unification of Adaptive Online Learning

Wenwu He, James Tin-Yau Kwok, Ji Zhu, Yang Liu
2017 IEEE Transactions on Neural Networks and Learning Systems  
In online convex optimization, adaptive algorithms, which can utilize the second-order information of the loss function's (sub)gradient, have shown improvements over standard gradient methods. This paper presents a framework Follow the Bregman Divergence Leader that unifies various existing adaptive algorithms from which new insights are revealed. Under the proposed framework two simple adaptive online algorithms with improvable guarantee are derived. Further, a general equation derived from
more » ... ion derived from matrix analysis generalizes the adaptive learning to nonlinear case with kernel trick. Index Terms-Online Learning, Adaptive Gradient Descent, Second order information, Follow the Bregman Divergence Leader 1 Examples of exp-concave loss include the log-loss t(wt) = − ln( wt, xt ) which arises in the problem of universal portfolio management [12] , and the square loss t(wt) = ( wt, xt − yt) 2 , which is widely used in regression problems [13] , [14] . For a strongly convex loss, regret in scale of O(ln T ) can be derived but it is rarely used in learning problems. So exp-concave can be viewed as a relaxation of strongly convex.
doi:10.1109/tnnls.2016.2527053 pmid:26929066 fatcat:eodmbdzixvh55jlhnbcg3ihkye