IA Scholar Query: An Inexact Projected Gradient Method for Sparsity-Constrained Quadratic Measurements Regression.
https://scholar.archive.org/
Internet Archive Scholar query results feedeninfo@archive.orgTue, 06 Sep 2022 00:00:00 GMTfatcat-scholarhttps://scholar.archive.org/help1440Neural Quantum States for Scientific Computing: Applications to Computational Chemistry and Finance
https://scholar.archive.org/work/z3junf2h2jdtvcdanbbffnwj4y
The variational quantum Monte Carlo (VQMC) method has received significant attention because of its ability to overcome the curse of dimensionality inherent in many-body quantum systems, by representing the exponentially complex quantum states variationally with machine learning models. We develop novel training strategies to improve the scalability of VQMC, and build parallelization frameworks for solving large-scale problems. The application of our method is extended to quantum chemistry and financial derivative pricing. For quantum chemistry, we build a pre-processing pipeline serving as an interface connecting molecular information and VQMC, and achieve remarkable performance in comparison with the classical approximate methods. On the other hand, we present a simple generalization of VQMC applicable to arbitrary linear PDEs, showcasing the technique in the Black-Scholes equation for pricing European contingent claims dependent on many underlying assets. We also introduce meta-learning and multi-fidelity active learning as exotic components to VQMC, which, under some reasonable assumptions on the problem formulation, can further improve the convergence and the sampling efficiency of our method.Tianchen Zhao, University, Mywork_z3junf2h2jdtvcdanbbffnwj4yTue, 06 Sep 2022 00:00:00 GMTOn the theory and practice of tensor recovery for high-dimensional partial differential equations
https://scholar.archive.org/work/6zfk5mi7ondg3kdmhbr23f4e3m
This thesis considers the problem of approximating low-rank tensors from data and its use for the non-intrusive solution of high-dimensional parametric partial differential equations (PDEs) and stochastic differential equations (SDEs). High-dimensional here refers to the large number of variables on which the solution depends. The looming curse of dimensionality, i.e. the exponential scaling of the number of parameters with respect to the number of variables, that is immanent to all generic, linear approximations, is evaded by applying hierarchical tensor formats, in particular tensor-trains, to represent the sought functions. As a non-intrusive method to attain such representations, regression is considered and the required high-dimensional integrals in the error functional are estimated by (quasi) Monte Carlo methods. The first part of this thesis analyzes the convergence of this empirical best approximation method and introduces a novel algorithm to find surprisingly good approximations even when the number of samples is low. The second part of this thesis considers the application of hierarchical tensor formats to practical problems and demonstrates the effectiveness of this approach on selected examples.Philipp Trunschke, Technische Universität Berlin, Reinhold Schneiderwork_6zfk5mi7ondg3kdmhbr23f4e3mMon, 29 Aug 2022 00:00:00 GMTConvex integer optimization with Frank-Wolfe methods
https://scholar.archive.org/work/r6o27gjwavbh3ixtlmppj2atxy
Mixed-integer nonlinear optimization is a broad class of problems that feature combinatorial structures and nonlinearities. Typical exact methods combine a branch-and-bound scheme with relaxation and separation subroutines. We investigate the properties and advantages of error-adaptive first-order methods based on the Frank-Wolfe algorithm for this setting, requiring only a gradient oracle for the objective function and linear optimization over the feasible set. In particular, we will study the algorithmic consequences of optimizing with a branch-and-bound approach where the subproblem is solved over the convex hull of the mixed-integer feasible set thanks to linear oracle calls, compared to solving the subproblems over the continuous relaxation of the same set. This novel approach computes feasible solutions while working on a single representation of the polyhedral constraints, leveraging the full extent of Mixed-Integer Programming (MIP) solvers without an outer approximation scheme.Deborah Hendrych and Hannah Troppens and Mathieu Besançon and Sebastian Pokuttawork_r6o27gjwavbh3ixtlmppj2atxyFri, 26 Aug 2022 00:00:00 GMTA Survey of ADMM Variants for Distributed Optimization: Problems, Algorithms and Features
https://scholar.archive.org/work/puetokzpozdapdzeibcbkcxa2e
By coordinating terminal smart devices or microprocessors to engage in cooperative computation to achieve systemlevel targets, distributed optimization is incrementally favored by both engineering and computer science. The well-known alternating direction method of multipliers (ADMM) has turned out to be one of the most popular tools for distributed optimization due to many advantages, such as modular structure, superior convergence, easy implementation and high flexibility. In the past decade, ADMM has experienced widespread developments. The developments manifest in both handling more general problems and enabling more effective implementation. Specifically, the method has been generalized to broad classes of problems (i.e.,multi-block, coupled objective, nonconvex, etc.). Besides, it has been extensively reinforced for more effective implementation, such as improved convergence rate, easier subproblems, higher computation efficiency, flexible communication, compatible with inaccurate information, robust to communication delays, etc. These developments lead to a plentiful of ADMM variants to be celebrated by broad areas ranging from smart grids, smart buildings, wireless communications, machine learning and beyond. However, there lacks a survey to document those developments and discern the results. To achieve such a goal, this paper provides a comprehensive survey on ADMM variants. Particularly, we discern the five major classes of problems that have been mostly concerned and discuss the related ADMM variants in terms of main ideas, main assumptions, convergence behaviors and main features. In addition, we figure out several important future research directions to be addressed. This survey is expected to work as a tutorial for both developing distributed optimization in broad areas and identifying existing theoretical research gaps.Yu Yang, Xiaohong Guan, Qing-Shan Jia, Liang Yu, Bolun Xu, Costas J. Spanoswork_puetokzpozdapdzeibcbkcxa2eTue, 23 Aug 2022 00:00:00 GMTComplexity of Inexact Proximal Point Algorithm for minimizing convex functions with Holderian Growth
https://scholar.archive.org/work/7r7gomcfs5ep5blcuhupkq7cla
Several decades ago the Proximal Point Algorithm (PPA) started to gain a long-lasting attraction for both abstract operator theory and numerical optimization communities. Even in modern applications, researchers still use proximal minimization theory to design scalable algorithms that overcome nonsmoothness. Remarkable works as established tight relations between the convergence behaviour of PPA and the regularity of the objective function. In this manuscript we derive nonasymptotic iteration complexity of exact and inexact PPA to minimize convex functions under γ-Holderian growth: log(1/ϵ) (for γ∈ [1,2]) and 1/ϵ^γ - 2 (for γ > 2). In particular, we recover well-known results on PPA: finite convergence for sharp minima and linear convergence for quadratic growth, even under presence of deterministic noise. Moreover, when a simple Proximal Subgradient Method is recurrently called as an inner routine for computing each IPPA iterate, novel computational complexity bounds are obtained for Restarting Inexact PPA. Our numerical tests show improvements over existing restarting versions of the Subgradient Method.Andrei Patrascu, Paul Iroftiwork_7r7gomcfs5ep5blcuhupkq7claSun, 21 Aug 2022 00:00:00 GMTOptimising Stable Radicals for the Electrochemical Generation of Reactive Intermediates
https://scholar.archive.org/work/eawknghuzvcrnn6sgpxgsoj7w4
This thesis concentrates on the electrochemical activation of stable-radical adducts to generate reactive intermediates for small molecule and polymer chemistry. The majority of this work concerns the computational modelling and design of such compounds using high-level, ab inito quantum chemistry methods. The main findings are as follows. It is first shown that adducts based on highly-stable Blatter and Kuhn-type radicals undergo mesolytic cleavage upon one-electron oxidation, generating reactive carbocations or carbon-centred radicals. Substituent effects are employed to optimise this chemistry, either to reduce the oxidation potential of the adduct to favour the production of radicals, or by altering the bond-dissociation free energy of mesolytic cleavage to control the rate of fragmentation. Computational chemistry is then used to explore the scope for stable-radical adducts as electrochemically activated alkylating agents. SN2-type methylations of pyridine are studied over a broad range of nitroxide, triazinyl, and verdazyl-based adducts (X-Me). Here, high oxidation potentials are found to render low SN2 barriers to methylation and thus more reactive agents, highlighting the suitability of commercially available, (2,2,6,6-tetramethylpiperidin-1-yl)oxyl (TEMPO), in this role. Modelling is also applied to study the triboelectrification of polymeric insulators. Here, material-specific charging properties and dissipation rates are found to be connected to the stability of anionic polymer fragments to oxidation, and cationic fragments to reduction. Computational methods are then used to study the low-frequency (Terahertz) vibrations in molecular crystals. A method benchmark is presented - identifying parameters that reliably produce accurate simulated spectra - along with several new analytical tools built for the assessment of spectral data.Fergus Rogers, University, The Australian Nationalwork_eawknghuzvcrnn6sgpxgsoj7w4Sat, 13 Aug 2022 00:00:00 GMTCardinality Minimization, Constraints, and Regularization: A Survey
https://scholar.archive.org/work/qbruvpkhpbd23ldgdligzqbyiu
We survey optimization problems that involve the cardinality of variable vectors in constraints or the objective function. We provide a unified viewpoint on the general problem classes and models, and give concrete examples from diverse application fields such as signal and image processing, portfolio selection, or machine learning. The paper discusses general-purpose modeling techniques and broadly applicable as well as problem-specific exact and heuristic solution approaches. While our perspective is that of mathematical optimization, a main goal of this work is to reach out to and build bridges between the different communities in which cardinality optimization problems are frequently encountered. In particular, we highlight that modern mixed-integer programming, which is often regarded as impractical due to commonly unsatisfactory behavior of black-box solvers applied to generic problem formulations, can in fact produce provably high-quality or even optimal solutions for cardinality optimization problems, even in large-scale real-world settings. Achieving such performance typically draws on the merits of problem-specific knowledge that may stem from different fields of application and, e.g., shed light on structural properties of a model or its solutions, or lead to the development of efficient heuristics; we also provide some illustrative examples.Andreas M. Tillmann, Daniel Bienstock, Andrea Lodi, Alexandra Schwartzwork_qbruvpkhpbd23ldgdligzqbyiuMon, 08 Aug 2022 00:00:00 GMTLet's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
https://scholar.archive.org/work/ioo5osphkzh6lkujg7daz2chhy
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can significantly improve the progress made by each BCD iteration. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with sparse dependencies between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active-set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization.Julie Nutini and Issam Laradji and Mark Schmidtwork_ioo5osphkzh6lkujg7daz2chhySun, 31 Jul 2022 00:00:00 GMTFrameworks and Results in Distributionally Robust Optimization
https://scholar.archive.org/work/7asen4h6rfeqbnqn6goubudjnm
The concepts of risk aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. The statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and relationships with robust optimization, risk aversion, chance-constrained optimization, and function regularization. Various approaches to model the distributional ambiguity and their calibrations are discussed. The paper also describes the main solution techniques used to the solve the resulting optimization problems.Hamed Rahimian, Sanjay Mehrotrawork_7asen4h6rfeqbnqn6goubudjnmWed, 27 Jul 2022 00:00:00 GMTGeneralized self-concordant analysis of Frank–Wolfe algorithms
https://scholar.archive.org/work/dna3tn42qvftrgkgntxt2rkxga
Projection-free optimization via different variants of the Frank–Wolfe method has become one of the cornerstones of large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex, making them a challenging class of functions for first-order methods. Indeed, in a number of applications, such as inverse covariance estimation or distance-weighted discrimination problems in binary classification, the loss is given by a generalized self-concordant function having potentially unbounded curvature. For such problems projection-free minimization methods have no theoretical convergence guarantee. This paper closes this apparent gap in the literature by developing provably convergent Frank–Wolfe algorithms with standard O(1/k) convergence rate guarantees. Based on these new insights, we show how these sublinearly convergent methods can be accelerated to yield linearly convergent projection-free methods, by either relying on the availability of a local liner minimization oracle, or a suitable modification of the away-step Frank–Wolfe method.Pavel Dvurechensky, Kamil Safin, Shimrit Shtern, Mathias Staudigl, Technische Informationsbibliothek (TIB)work_dna3tn42qvftrgkgntxt2rkxgaThu, 23 Jun 2022 00:00:00 GMTProjection-free Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data
https://scholar.archive.org/work/qen2mrcta5h6jc6hyedfjfdbga
We study a projection-free conditional gradient-type algorithm for constrained nonconvex stochastic optimization problems with Markovian data. In particular, we focus on the case when the transition kernel of the Markov chain is state-dependent. Such stochastic optimization problems arise in various machine learning problems including strategic classification and reinforcement learning. For this problem, we establish that the number of calls to the stochastic first-order oracle and the linear minimization oracle to obtain an appropriately defined ϵ-stationary point, are of the order 𝒪(1/ϵ^2.5) and 𝒪(1/ϵ^5.5) respectively. We also empirically demonstrate the performance of our algorithm on the problem of strategic classification with neural networks.Abhishek Roywork_qen2mrcta5h6jc6hyedfjfdbgaWed, 22 Jun 2022 00:00:00 GMTApproximate Frank-Wolfe Algorithms over Graph-structured Support Sets
https://scholar.archive.org/work/a76ws6tjuvhlvpoasuoq2iyq3q
In this paper, we consider approximate Frank-Wolfe (FW) algorithms to solve convex optimization problems over graph-structured support sets where the linear minimization oracle (LMO) cannot be efficiently obtained in general. We first demonstrate that two popular approximation assumptions (additive and multiplicative gap errors) are not applicable in that no cheap gap-approximate LMO oracle exists. Thus, approximate dual maximization oracles (DMO) are proposed, which approximate the inner product rather than the gap. We prove that the standard FW method using a δ-approximate DMO converges as 𝒪((1-δ) √(s)/δ) in the worst case, and as 𝒪(L/(δ^2 t)) over a δ-relaxation of the constraint set. Furthermore, when the solution is on the boundary, a variant of FW converges as 𝒪(1/t^2) under the quadratic growth assumption. Our empirical results suggest that even these improved bounds are pessimistic, showing fast convergence in recovering real-world images with graph-structured sparsity.Baojian Zhou, Yifan Sunwork_a76ws6tjuvhlvpoasuoq2iyq3qFri, 17 Jun 2022 00:00:00 GMTDictionary optimization for representing sparse signals using Rank-One Atom Decomposition (ROAD)
https://scholar.archive.org/work/342jinpbqnfyrlt6rqu2ojzc7i
Dictionary learning has attracted growing research interest during recent years. As it is a bilinear inverse problem, one typical way to address this problem is to iteratively alternate between two stages: sparse coding and dictionary update. The general principle of the alternating approach is to fix one variable and optimize the other one. Unfortunately, for the alternating method, an ill-conditioned dictionary in the training process may not only introduce numerical instability but also trap the overall training process towards a singular point. Moreover, it leads to difficulty in analyzing its convergence, and few dictionary learning algorithms have been proved to have global convergence. For the other bilinear inverse problems, such as short-and-sparse deconvolution (SaSD) and convolutional dictionary learning (CDL), the alternating method is still a popular choice. As these bilinear inverse problems are also ill-posed and complicated, they are tricky to handle. Additional inner iterative methods are usually required for both of the updating stages, which aggravates the difficulty of analyzing the convergence of the whole learning process. It is also challenging to determine the number of iterations for each stage, as over-tuning any stage will trap the whole process into a local minimum that is far from the ground truth. To mitigate the issues resulting from the alternating method, this thesis proposes a novel algorithm termed rank-one atom decomposition (ROAD), which intends to recast a bilinear inverse problem into an optimization problem with respect to a single variable, that is, a set of rank-one matrices. Therefore, the resulting algorithm is one stage, which minimizes the sparsity of the coefficients while keeping the data consistency constraint throughout the whole learning process. Inspired by recent advances in applying the alternating direction method of multipliers (ADMM) to nonconvex nonsmooth problems, an ADMM solver is adopted to address ROAD problems, and a lower bound of the penalty paramet [...]Cheng Cheng, Wei Daiwork_342jinpbqnfyrlt6rqu2ojzc7iThu, 09 Jun 2022 00:00:00 GMTGradient Projection Newton Pursuit for Sparsity Constrained Optimization
https://scholar.archive.org/work/hcc65irswfecbpuid7mqps7lhy
Hard-thresholding-based algorithms have seen various advantages for sparse optimization in controlling the sparsity and allowing for fast computation. Recent research shows that when techniques of the Newton-type methods are integrated, their numerical performance can be improved surprisingly. This paper develops a gradient projection Newton pursuit algorithm that mainly adopts the hard-thresholding operator and employs the Newton pursuit only when certain conditions are satisfied. The proposed algorithm is capable of converging globally and quadratically under the standard assumptions. When it comes to compressive sensing problems, the imposed assumptions are much weaker than those for many state-of-the-art algorithms. Moreover, extensive numerical experiments have demonstrated its high performance in comparison with the other leading solvers.Shenglong Zhouwork_hcc65irswfecbpuid7mqps7lhyMon, 06 Jun 2022 00:00:00 GMTMitigating multiple descents: A model-agnostic framework for risk monotonization
https://scholar.archive.org/work/xrqfceoelzedvlrg7mre3zy6qi
Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for risk monotonization based on cross-validation that takes as input a generic prediction procedure and returns a modified procedure whose out-of-sample prediction risk is, asymptotically, monotonic in the limiting aspect ratio. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that, under very mild assumptions, they provably achieve monotonic asymptotic risk behavior. Our results are applicable to a broad variety of prediction procedures and loss functions, and do not require a well-specified (parametric) model. We exemplify our framework with concrete analyses of the minimum ℓ_2, ℓ_1-norm least squares prediction procedures. As one of the ingredients in our analysis, we also derive novel additive and multiplicative forms of oracle risk inequalities for split cross-validation that are of independent interest.Pratik Patil, Arun Kumar Kuchibhotla, Yuting Wei, Alessandro Rinaldowork_xrqfceoelzedvlrg7mre3zy6qiWed, 25 May 2022 00:00:00 GMTGradient Methods with Memory for Minimizing Composite Functions
https://scholar.archive.org/work/jizpkupl6nge3pytnktocnh47i
The recently introduced Gradient Methods with Memory use a subset of the past oracle information to create a model of the objective function, whose accuracy enables them to surpass the traditional Gradient Methods in practical performance. The model introduces an overhead that is substantial, unless dealing with smooth unconstrained problems. In this work, we introduce several Gradient Methods with Memory that can solve composite problems efficiently, including unconstrained problems with non-smooth objectives. The auxiliary problem at each iteration still cannot be solved exactly but we show how to alter the model and how to initialize the auxiliary problem solver to ensure that this inexactness does not degrade the convergence guarantees. Moreover, we dynamically increase the convergence guarantees as to provably surpass those of their memory-less counterparts. These properties are preserved when applying acceleration and the containment of inexactness further prevents error accumulation. Our methods are able to estimate key geometry parameters to attain state-of-the art worst-case rates on many important subclasses of composite problems, where the objective smooth part satisfies a strong convexity condition or a relaxation thereof. In particular, we formulate a restart strategy applicable to optimization methods with sublinear convergence guarantees of any order. We support the theoretical results with simulations.Mihai I. Floreawork_jizpkupl6nge3pytnktocnh47iMon, 09 May 2022 00:00:00 GMTAnomaly Detection Based on Convex Analysis: A Survey
https://scholar.archive.org/work/qpfuwq7dg5etrjtxmdodemnbda
As a crucial technique for identifying irregular samples or outlier patterns, anomaly detection has broad applications in many fields. Convex analysis (CA) is one of the fundamental methods used in anomaly detection, which contributes to the robust approximation of algebra and geometry, efficient computation to a unique global solution, and mathematical optimization for modeling. Despite the essential role and evergrowing research in CA-based anomaly detection algorithms, little work has realized a comprehensive survey of it. To fill this gap, we summarize the CA techniques used in anomaly detection and classify them into four categories of density estimation methods, matrix factorization methods, machine learning methods, and the others. The theoretical background, sub-categories of methods, typical applications as well as strengths and limitations for each category are introduced. This paper sheds light on a succinct and structured framework and provides researchers with new insights into both anomaly detection and CA. With the remarkable progress made in the techniques of big data and machine learning, CA-based anomaly detection holds great promise for more expeditious, accurate and intelligent detection capacities.Tong Wang, Mengsi Cai, Xiao Ouyang, Ziqiang Cao, Tie Cai, Xu Tan, Xin Luwork_qpfuwq7dg5etrjtxmdodemnbdaWed, 27 Apr 2022 00:00:00 GMTTowards Robust and Resilient Machine Learning
https://scholar.archive.org/work/hqeg7fmwb5gahhvqculu4eeeme
Some common assumptions when building machine learning pipeline are: (1) the training data is sufficiently "clean" and well-behaved, so that there are few or no outliers, or that the distribution of the data does not have very long tails, (2) the testing data follows the same distribution as the training data, and (3) the data is generated from or is close to a known model class, such as a linear model or neural network. However, with easier access to computer, internet and various sensor-based technologies, modern data sets that arise in various branches of science and engineering are no longer carefully curated and are often collected in a decentralized, distributed fashion. Consequently, they are plagued with the complexities of heterogeneity, adversarial manipulations, and outliers. As we enter this age of dirty data, the aforementioned assumptions of machine learning pipelines are increasingly indefensible. For the widespread adoption of Machine Learning, we believe that it is imperative that any model should have the following three basic elements: • Robustness: The model can be trained even with noisy and corrupted data. • Reliability: After training and when deployed in the real-world, the model should not break down under benign shifts of the distribution. • Resilience: The modeling procedure should work under model mis-specification, i.e. even when the modeling assumption breaks down, the model should find the best possible solution. In this thesis, our goal is modify state of the art ML techniques and design new algorithms so that they work even without the aforementioned assumptions, and are robust, reliable and resilient. Our contributions are as follows: In chapter 2, we provide a new class of statisically-optimal estimators that are provably robust to a variety of robustness settings, such as arbitrary contamination, and heavy-tailed data, among others. In Chapter 3, we complement our statistical optimal estimators with a new class of computationally-efficient estimators for robust risk minimizatio [...]Adarsh Prasadwork_hqeg7fmwb5gahhvqculu4eeemeThu, 21 Apr 2022 00:00:00 GMTJoint Continuous and Discrete Model Selection via Submodularity
https://scholar.archive.org/work/lpitlkdan5grhashef3bwb5pra
In model selection problems for machine learning, the desire for a well-performing model with meaningful structure is typically expressed through a regularized optimization problem. In many scenarios, however, the meaningful structure is specified in some discrete space, leading to difficult nonconvex optimization problems. In this paper, we connect the model selection problem with structure-promoting regularizers to submodular function minimization with continuous and discrete arguments. In particular, we leverage the theory of submodular functions to identify a class of these problems that can be solved exactly and efficiently with an agnostic combination of discrete and continuous optimization routines. We show how simple continuous or discrete constraints can also be handled for certain problem classes and extend these ideas to a robust optimization framework. We also show how some problems outside of this class can be embedded within the class, further extending the class of problems our framework can accommodate. Finally, we numerically validate our theoretical results with several proof-of-concept examples with synthetic and real-world data, comparing against state-of-the-art algorithms.Jonathan Bunton, Paulo Tabuadawork_lpitlkdan5grhashef3bwb5praSun, 03 Apr 2022 00:00:00 GMTCPD-Structured Multivariate Polynomial Optimization
https://scholar.archive.org/work/723ejh4vcfa4no677dssstyrre
We introduce the Tensor-Based Multivariate Optimization (TeMPO) framework for use in nonlinear optimization problems commonly encountered in signal processing, machine learning, and artificial intelligence. Within our framework, we model nonlinear relations by a multivariate polynomial that can be represented by low-rank symmetric tensors (multi-indexed arrays), making a compromise between model generality and efficiency of computation. Put the other way around, our approach both breaks the curse of dimensionality in the system parameters and captures the nonlinear relations with a good accuracy. Moreover, by taking advantage of the symmetric CPD format, we develop an efficient second-order Gauss–Newton algorithm for multivariate polynomial optimization. The presented algorithm has a quadratic per-iteration complexity in the number of optimization variables in the worst case scenario, and a linear per-iteration complexity in practice. We demonstrate the efficiency of our algorithm with some illustrative examples, apply it to the blind deconvolution of constant modulus signals, and the classification problem in supervised learning. We show that TeMPO achieves similar or better accuracy than multilayer perceptrons (MLPs), tensor networks with tensor trains (TT) and projected entangled pair states (PEPS) architectures for the classification of the MNIST and Fashion MNIST datasets while at the same time optimizing for fewer parameters and using less memory. Last but not least, our framework can be interpreted as an advancement of higher-order factorization machines: we introduce an efficient second-order algorithm for higher-order factorization machines.Muzaffer Ayvaz, Lieven De Lathauwerwork_723ejh4vcfa4no677dssstyrreWed, 30 Mar 2022 00:00:00 GMT