An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity

C. Cartis, N. I. M. Gould, P. L. Toint
2012 IMA Journal of Numerical Analysis  
The adaptive cubic regularization algorithm described in Cartis et al. (2009, Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program., 127, 245-295; 2010 , Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function-and derivative-evaluation complexity [online]. Math. Program., ) is adapted to the problem of minimizing a nonlinear, possibly nonconvex, smooth objective
more » ... on over a convex domain. Convergence to first-order critical points is shown under standard assumptions, without any Lipschitz continuity requirement on the objective's Hessian. A worst-case complexity analysis in terms of evaluations of the problem's function and derivatives is also presented for the Lipschitz continuous case and for a variant of the resulting algorithm. This analysis extends the best-known bound for general unconstrained problems to nonlinear problems with convex constraints. approximate model Hessians and approximate model minimizers, which makes this suitable for largescale problems. These adaptive regularization methods are not only globally convergent to first-and second-order critical points with fast asymptotic speed (Nesterov & Polyak, 2006; Cartis et al., 2011a) , but also-unprecedentedly-enjoy better worst-case global complexity bounds than steepest descent methods (Nesterov & Polyak, 2006; Cartis et al., 2011b) , Newton's method and trust-region methods (Cartis et al., 2010) . Furthermore, preliminary numerical experiments with basic implementations of these techniques and of the trust region show encouraging performance of the cubic regularization approach (Cartis et al., 2011a) . Extending the approach to more general optimization problems is therefore attractive, as one may hope that some of the qualities of the unconstrained methods can be transferred to a broader framework. Nesterov (2006) has considered the extension of his cubic regularization method to problems with a smooth convex objective function and convex constraints. In this paper we consider the extension of the adaptive cubic regularization methods to the case where minimization is subject to convex constraints, but the smooth objective function is no longer assumed to be convex. The new algorithm is strongly inspired by the unconstrained adaptive cubic regularization methods (Cartis et al., 2011a,b) and by the trust-region projection methods for the same constrained problem class that are fully described in of Conn et al. (2000, Chapter 12). In particular, it makes significant use of the specialized first-order criticality measure developed by Conn et al. (1993) for the latter context. Firstly, global convergence to first-order critical points is shown under mild assumptions on the problem class for a generic adaptive cubic regularization framework that only requires a Cauchy-like decrease in the (constrained) model subproblem. The latter can be efficiently computed using a generalized Goldstein linesearch, suitable for the cubic model, provided projections onto the feasible set are inexpensive to calculate. The associated worst-case global complexity-or equivalently, the total number of objective function and gradient evaluations-required by this generic cubic regularization approach to reach approximate first-order optimality matches, in order, that of steepest descent for unconstrained (nonconvex) optimization. However, in order to improve the local and global rate of convergence of the algorithm, it is necessary to advance beyond the Cauchy point when minimizing the model. To this end we propose an adaptive cubic regularization variant that under certain assumptions on the algorithm, can be proved to satisfy the desirable global evaluation complexity bound of its unconstrained counterpart, which, as mentioned in the first paragraph, is better than for steepest descent methods. As in the unconstrained case we do not rely on global model minimization and are content with only sequential line minimizations of the model provided they ensure descent at each (inner) step. Possible descent paths of this type are suggested, though more work is needed to transform these ideas into a computationally efficient model solution procedure. Solving the (constrained) subproblem relies on the assumption that these piecewise linear paths are uniformly bounded, which still requires both practical and theoretical validation. Our complexity analysis here, in terms of the function-evaluations count, does not cover the total computational cost of solving the problem as it ignores the cost of solving the (constrained) subproblem. Note, however, that though the latter may be NP-hard computationally (Vavasis, 1991), it does not require any additional function evaluations. Furthermore, for many examples, the cost of these (blackbox) evaluations significantly dominates that of the internal computations performed by the algorithm. Even so, effective step calculation is crucial for the practical computational efficiency of the algorithm and will be given priority consideration in our future work. The paper is organized as follows. Section 2 describes the constrained problem more formally as well as the new adaptive regularization algorithm for it, while Section 3 presents the associated convergence theory (to first-order critical points). We then discuss a worst-case function-evaluation complexity result by guest on
doi:10.1093/imanum/drr035 fatcat:zxiq5akp6vbc7lshou7bxqsuxe