Do Subsampled Newton Methods Work for High-Dimensional Data?

Xiang Li, Shusen Wang, Zhihua Zhang
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Subsampled Newton methods approximate Hessian matrices through subsampling techniques to alleviate the per-iteration cost. Previous results require Ω (d) samples to approximate Hessians, where d is the dimension of data points, making it less practical for high-dimensional data. The situation is deteriorated when d is comparably as large as the number of data points n, which requires to take the whole dataset into account, making subsampling not useful. This paper theoretically justifies the
more » ... ectiveness of subsampled Newton methods on strongly convex empirical risk minimization with high dimensional data. Specifically, we provably require only Θ˜(deffγ) samples for approximating the Hessian matrices, where deffγ is the γ-ridge leverage and can be much smaller than d as long as nγ ≫ 1. Our theories work for three types of Newton methods: subsampled Netwon, distributed Newton, and proximal Newton.
doi:10.1609/aaai.v34i04.5905 fatcat:tk7idalcd5dpdjh2clv64yz7hi