IA Scholar Query: Can non-developers learn a simplified modeling notation quickly?
https://scholar.archive.org/
Internet Archive Scholar query results feedeninfo@archive.orgSun, 04 Dec 2022 00:00:00 GMTfatcat-scholarhttps://scholar.archive.org/help1440Music Translation: Generating Piano Arrangements in Different Playing Levels
https://scholar.archive.org/work/tl2etwzyxffbfg2qcpqb5mjlwq
We present a novel task of playing level conversion: generating a music arrangement in a target difficulty level, given another arrangement of the same musical piece in a different level. For this task, we create a parallel dataset of piano arrangements in two strictly well-defined playing levels, annotated at individual phrase resolution, taken from the song catalog of a piano learning app.In a series of experiments, we train models that successfully modify the playing level while preserving the musical 'essence'. We further show, via an ablation study, the contributions of specific data representation and augmentation techniques to the model's performance.In order to evaluate the performance of our models, we conduct a human evaluation study with expert musicians. The evaluation shows that our best model creates arrangements that are almost as good as ground truth examples. Additionally, we propose MuTE, an automated evaluation metric for music translation tasks, and show that it correlates with human ratings.Matan Gover, Oded Zewiwork_tl2etwzyxffbfg2qcpqb5mjlwqSun, 04 Dec 2022 00:00:00 GMTPrivate Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses
https://scholar.archive.org/work/fet5idxa6nb7dmsnxu7gm4gcey
We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work [WXDX20, KLZ22], we assume that stochastic gradients have bounded k-th order moments for some k ≥ 2. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the k-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm for smooth losses that runs in linear time and has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.Andrew Lowy, Meisam Razaviyaynwork_fet5idxa6nb7dmsnxu7gm4gceyWed, 30 Nov 2022 00:00:00 GMTConjunctive queries for logic-based information extraction
https://scholar.archive.org/work/wd2pb3qomzeb7lqcc3av3fepqq
This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. Our first results consider the comparative expressive power of FC[REG]-CQ in relation to document spanners (a formal framework for the query language AQL), and various fragments of FC[REG]-CQ – some of which coincide with well-known language generators, such as patterns and regular expressions. Then, we look at decision problems. We show that many decision problems for FC-CQ and FC[REG]-CQ (such as equivalence and regularity) are undecidable. The model checking problem for FC-CQ and FC[REG]-CQ is NP-complete even if the FC-CQ is acyclic – under the definition of acyclicity where each word equation in an FC-CQ is an atom. This leads us to look at the "decomposition" of an FC word equation into a conjunction of binary word equations (i.e., of the form x =˙ y · z). If a query consists of only binary word equations and the query is acyclic, then model checking is tractable and we can enumerate results efficiently. We give an algorithm that decomposes an FC-CQ into an acyclic FC-CQ consisting of binary word equations in polynomial time, or determines that this is not possible. The second approach is to consider the dynamic complexity of FC. This uses the common way of encoding words in a relational structure using a universe with a linear order along with symbol predicates. Then, each element of the universe can carry a symbol if the predicate for said symbol holds for that element. Instead of the "usual way" (looking at first-order logic over these structures), we study the dynamic complexity, where symbols can be modified. As each of these modifications only c [...]Sam M Thompsonwork_wd2pb3qomzeb7lqcc3av3fepqqWed, 30 Nov 2022 00:00:00 GMTA–E
https://scholar.archive.org/work/enoy33f5ejdntgmhbnitknxxlu
He gained a national reputation with the Viipuri Municipal Library , destroyed in World War II, and an international one with his Finnish pavilions at the World's Fairs at Paris (1937) and New York (1939-40). He made imaginative use of wood with brickwork, glass, copper and cement and also developed functional plywood furniture, mass-produced in his own factory. His range of commissions, including the Maison Carré in Paris, Baker House in Cambridge, Mass., and the Finlandia Concert Hall, Helsinki, was extensive: factories, museums, churches, theatres, department stores, private houses and public housing. He was professor of architecture at the Massachusetts Institute of Technology 1945-49. Aaron (c.14th-13th centuries BCE). Hebrew High Priest. In the Bible story, with his brother *Moses, he led the Israelites from Egypt to Canaan (Palestine) and became their first high priest, but while Moses was receiving the Ten Commandments on Mount Sinai he made a golden calf for the people to worship (Exodus xxiii).work_enoy33f5ejdntgmhbnitknxxluWed, 30 Nov 2022 00:00:00 GMTExplain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes
https://scholar.archive.org/work/tsb7zrtte5f3fbix64odo4icge
In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsevwork_tsb7zrtte5f3fbix64odo4icgeWed, 30 Nov 2022 00:00:00 GMTF–J
https://scholar.archive.org/work/cn4pwk4efnaa3ba27vpt6ux6ca
Fabergé, Peter Carl (1846Carl ( -1920)) . Russian jeweller, of French descent. He achieved fame by the ingenuity and extravagance of the jewelled objects (especially Easter eggs) he devised for the Russian nobility and the tsar in an age of ostentatious extravagance which ended on the outbreak of World War I. He died in Switzerland.work_cn4pwk4efnaa3ba27vpt6ux6caWed, 30 Nov 2022 00:00:00 GMTSmart Cities and Architectural Structures: Communicational and Informational Space
https://scholar.archive.org/work/cbbp3g3255en3kvyflybpyig4y
The expectations for shaping the urban landscape toward the ethical and aesthetic values of democracy are seen as the main challenge of an intelligent environment, made possible via information and communication technologies. Consequently, architecture's tendency to embrace digital media strives to create innovative and sustainable infrastructure. This approach aims for an argumentative theoretical analysis of aesthetics and communication sciences. The focus is on the context that continuously evolves living traditions persuaded by innovation that modifies and facilitates the evolution of society. The approach is also supposed to be a constantly evolving practice that engenders interaction between past, present, and future, configuring a unique urban landscape. The goal is about the metropolis as a collective achievement, seeking innovation through technologies while preserving tradition. Therefore, the convergence between architecture, technology, and new media requires the consideration of two viewpoints in this analysis. The first is the adopted architectural spatial models. The second is the transformative structure through new media, creating realities, intelligent environments, and interactive communities. Under these two directions, the artificial environment and imagined configuration through digital media are discussed, considering that technology overcame natural boundaries: the leitmotif of human cultural development.Christiane Wagnerwork_cbbp3g3255en3kvyflybpyig4yTue, 29 Nov 2022 00:00:00 GMTLarge Gauge Effects and the Structure of Amplitudes
https://scholar.archive.org/work/llye6cuabjdctpyxlhij6blrlu
We show that large gauge transformations modify the structure of momentum conservation leading to non-vanishing three-point amplitudes in a simple toy model of a gravitational wave event. This phenomenon resolves an apparent tension between perturbative scattering amplitude computations and exact methods in field theory. The tension is resolved to all orders of perturbation theory once large gauge effects are included via a modified LSZ prescription; if they are omitted, perturbative methods only recover a subset of terms in the full non-perturbative expression. Although our results are derived in the context of specific examples, several aspects of our work have analogues in dynamical gravitational scattering processes.Andrea Cristofoli, Asaad Elkhidir, Anton Ilderton, Donal O'Connellwork_llye6cuabjdctpyxlhij6blrluTue, 29 Nov 2022 00:00:00 GMTQuasi-stable Coloring for Graph Compression: Approximating Max-Flow, Linear Programs, and Centrality
https://scholar.archive.org/work/yjz42a5poraqxfje7zxwwyjtaa
We propose quasi-stable coloring, an approximate version of stable coloring. Stable coloring, also called color refinement, is a well-studied technique in graph theory for classifying vertices, which can be used to build compact, lossless representations of graphs. However, its usefulness is limited due to its reliance on strict symmetries. Real data compresses very poorly using color refinement. We propose the first, to our knowledge, approximate color refinement scheme, which we call quasi-stable coloring. By using approximation, we alleviate the need for strict symmetry, and allow for a tradeoff between the degree of compression and the accuracy of the representation. We study three applications: Linear Programming, Max-Flow, and Betweenness Centrality, and provide theoretical evidence in each case that a quasi-stable coloring can lead to good approximations on the reduced graph. Next, we consider how to compute a maximal quasi-stable coloring: we prove that, in general, this problem is NP-hard, and propose a simple, yet effective algorithm based on heuristics. Finally, we evaluate experimentally the quasi-stable coloring technique on several real graphs and applications, comparing with prior approximation techniques. A reference implementation and the experiment code are available at https://github.com/mkyl/QuasiStableColors.jl .Moe Kayali, Dan Suciuwork_yjz42a5poraqxfje7zxwwyjtaaTue, 29 Nov 2022 00:00:00 GMTLearning Antidote Data to Individual Unfairness
https://scholar.archive.org/work/yssdphnofbbydjcmj7pb74k25m
Fairness is an essential factor for machine learning systems deployed in high-stake applications. Among all fairness notions, individual fairness, following a consensus that 'similar individuals should be treated similarly,' is a vital notion to guarantee fair treatment for individual cases. Previous methods typically characterize individual fairness as a prediction-invariant problem when perturbing sensitive attributes, and solve it by adopting the Distributionally Robust Optimization (DRO) paradigm. However, adversarial perturbations along a direction covering sensitive information do not consider the inherent feature correlations or innate data constraints, and thus mislead the model to optimize at off-manifold and unrealistic samples. In light of this, we propose a method to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness. These on-manifold antidote data can be used through a generic optimization procedure with original training data, resulting in a pure pre-processing approach to individual unfairness, or can also fit well with the in-processing DRO paradigm. Through extensive experiments, we demonstrate our antidote data resists individual unfairness at a minimal or zero cost to the model's predictive utility.Peizhao Li, Ethan Xia, Hongfu Liuwork_yssdphnofbbydjcmj7pb74k25mTue, 29 Nov 2022 00:00:00 GMTLSQ 2.0: A linked dataset of SPARQL query logs
https://scholar.archive.org/work/udcljntolfbundbway2efp6s3y
We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 43.95 million executions of 11.56 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with the characteristics they require. We begin by describing the use cases envisaged for the LSQ dataset, which include applications for research on common features of queries, for building custom benchmarks, and for designing user interfaces. We then discuss how LSQ has been used in practice since the release of four initial SPARQL logs in 2015. We discuss the model and vocabulary that we use to represent these queries in RDF. We then provide a brief overview of the 27 endpoints from which we extracted queries in terms of the domain to which they pertain and the data they contain. We provide statistics on the queries included from each log, including the number of query executions, unique queries, as well as distributions of queries for a variety of selected characteristics. We finally discuss how the LSQ dataset is hosted and how it can be accessed and leveraged by interested parties for their use cases.Claus Stadler, Muhammad Saleem, Qaiser Mehmood, Carlos Buil-Aranda, Michel Dumontier, Aidan Hogan, Axel-Cyrille Ngonga Ngomo, Philippe Cudré-Maurouxwork_udcljntolfbundbway2efp6s3yTue, 29 Nov 2022 00:00:00 GMTHomotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
https://scholar.archive.org/work/3t65mqbk5nfqxfp4nkugkgxmla
We propose a new policy gradient method, named homotopic policy mirror descent (HPMD), for solving discounted, infinite horizon MDPs with finite state and action spaces. HPMD performs a mirror descent type policy update with an additional diminishing regularization term, and possesses several computational properties that seem to be new in the literature. We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies. Then local superlinear convergence is obtained for both quantities without any assumption. With local acceleration and diminishing regularization, we establish the first result among policy gradient methods on certifying and characterizing the limiting policy, by showing, with a non-asymptotic characterization, that the last-iterate policy converges to the unique optimal policy with the maximal entropy. We then extend all the aforementioned results to HPMD instantiated with a broad class of decomposable Bregman divergences, demonstrating the generality of the these computational properties. As a by product, we discover the finite-time exact convergence for some commonly used Bregman divergences, implying the continuing convergence of HPMD to the limiting policy even if the current policy is already optimal. Finally, we develop a stochastic version of HPMD and establish similar convergence properties. By exploiting the local acceleration, we show that for small optimality gap, a better than 𝒪̃(|𝒮| |𝒜| / ϵ^2) sample complexity holds with high probability, when assuming a generative model for policy evaluation.Yan Li, Guanghui Lan, Tuo Zhaowork_3t65mqbk5nfqxfp4nkugkgxmlaTue, 29 Nov 2022 00:00:00 GMT2019
https://scholar.archive.org/work/wcy47hfvvvdwvfgnwx2cuak4ze
On completion of this course, students will have knowledge in: • CO1.Basics of electrochemistry. Classical & modern batteries and fuel cells. CO2. Causes & effects of corrosion of metals and control of corrosion. Modification of surface properties of metals to develop resistance to corrosion, wear, tear, impact etc. by electroplating and electroless plating. CO3. Production & consumption of energy for industrialization of country and living standards of people. Utilization of solar energy for different useful forms of energy. CO4. Understanding Phase rule and instrumental techniques and its applications. CO5.Over viewing of synthesis, properties and applications of nanomaterials.BTECH.CSwork_wcy47hfvvvdwvfgnwx2cuak4zeMon, 28 Nov 2022 00:00:00 GMTTheory of layered-oxide cathode degradation in Li-ion batteries by oxidation-induced cation disorder
https://scholar.archive.org/work/hy334zlyzrfd3otnl7ahhlsfuq
Disorder-driven degradation phenomena, such as structural phase transformations and surface reconstructions, can significantly reduce the lifetime of Li-ion batteries, especially those with nickel-rich layered-oxide cathodes. We develop a general free energy model for layered-oxide ion-intercalation materials as a function of the degree of disorder, which represents the density of defects in the host crystal. The model accounts for defect core energies, long-range dipolar electrostatic forces, and configurational entropy of the solid solution. In the case of nickel-rich oxides, we hypothesize that nickel with a high concentration of defects is driven into the bulk by electrostatic forces as oxidation reactions at the solid-electrolyte interface reduce nickel and either evolve oxygen gas or oxidize the organic electrolyte at high potentials (>4.4V vs. Li/Li+). The model is used in battery cycling simulations to describe the extent of cathode degradation when using different voltage cutoffs, in agreement with experimental observations that lower-voltage cycling can substantially reduce cathode degradation. The theory provides a framework to guide the development of cathode compositions, coatings and electrolytes to enhance rate capability and enhance battery lifetime. The general theory of cation-disorder formation may also find applications in electrochemical water treatment and ion separations, such as lithium extraction from brines, based on competitive ion intercalation in battery materials.Debbie Zhuang, Martin Z. Bazantwork_hy334zlyzrfd3otnl7ahhlsfuqMon, 28 Nov 2022 00:00:00 GMTHessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence
https://scholar.archive.org/work/jk2w57muhjekbn3p6dwtjsgauy
We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We propose to address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all the past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme exhibits local Q-superlinear convergence with a non-asymptotic rate of (Υ√(log (t)/t) )^t, where Υ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the method, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.Sen Na, Michał Dereziński, Michael W. Mahoneywork_jk2w57muhjekbn3p6dwtjsgauyMon, 28 Nov 2022 00:00:00 GMTMemory-efficient array redistribution through portable collective communication
https://scholar.archive.org/work/y65c3tc4ebbnrpqdndfnqmhhoy
Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs across accelerator systems. We evaluate our approach against the XLA implementation and find that our approach delivers a geometric mean speedup of 1.22×, with maximum speedups as a high as 5.7×, while offering provable memory guarantees, making our system particularly appealing for large-scale models.Norman A. Rink, Adam Paszke, Dimitrios Vytiniotis, Georg Stefan Schmidwork_y65c3tc4ebbnrpqdndfnqmhhoyMon, 28 Nov 2022 00:00:00 GMTFsaNet: Frequency Self-attention for Semantic Segmentation
https://scholar.archive.org/work/lytiybjwmrcazk75zxp5yrwyvi
Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping (1×1 convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires 87.29%∼ 90.04% less memory, 96.13%∼ 98.07% less FLOPs, and 97.56%∼ 98.18% in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result (83.0% mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.Fengyu Zhang, Ashkan Panahi, Guangjun Gaowork_lytiybjwmrcazk75zxp5yrwyviMon, 28 Nov 2022 00:00:00 GMTA smooth basis for atomistic machine learning
https://scholar.archive.org/work/z2432isiivalbgiuagdilqxghe
Machine learning frameworks based on correlations of interatomic positions begin with a discretized description of the density of other atoms in the neighbourhood of each atom in the system. Symmetry considerations support the use of spherical harmonics to expand the angular dependence of this density, but there is as yet no clear rationale to choose one radial basis over another. Here we investigate the basis that results from the solution of the Laplacian eigenvalue problem within a sphere around the atom of interest. We show that this generates the smoothest possible basis of a given size within the sphere, and that a tensor product of Laplacian eigenstates also provides the smoothest possible basis for expanding any higher-order correlation of the atomic density within the appropriate hypersphere. We consider several unsupervised metrics of the quality of a basis for a given dataset, and show that the Laplacian eigenstate basis has a performance that is much better than some widely used basis sets and is competitive with data-driven bases that numerically optimize each metric. In supervised machine learning tests, we find that the optimal function smoothness of the Laplacian eigenstates leads to comparable or better performance than can be obtained from a data-driven basis of a similar size that has been optimized to describe the atom-density correlation for the specific dataset. We conclude that the smoothness of the basis functions is a key and hitherto largely overlooked aspect of successful atomic density representations.Filippo Bigi, Kevin Huguenin-Dumittan, Michele Ceriotti, David E. Manolopouloswork_z2432isiivalbgiuagdilqxgheMon, 28 Nov 2022 00:00:00 GMTCharacterizing the robustness of Bayesian adaptive experimental designs to active learning bias
https://scholar.archive.org/work/2fddguydk5botknhsq7c6ogtli
Bayesian adaptive experimental design is a form of active learning, which chooses samples to maximize the information they give about uncertain parameters. Prior work has shown that other forms of active learning can suffer from active learning bias, where unrepresentative sampling leads to inconsistent parameter estimates. We show that active learning bias can also afflict Bayesian adaptive experimental design, depending on model misspecification. We analyze the case of estimating a linear model, and show that worse misspecification implies more severe active learning bias. At the same time, model classes incorporating more "noise" - i.e., specifying higher inherent variance in observations - suffer less from active learning bias. Finally, we demonstrate empirically that insights from the linear model can predict the presence and degree of active learning bias in nonlinear contexts, namely in a (simulated) preference learning experiment.Sabina J. Sloman, Daniel M. Oppenheimer, Stephen B. Broomell, Cosma Rohilla Shaliziwork_2fddguydk5botknhsq7c6ogtliMon, 28 Nov 2022 00:00:00 GMTOptimal and Adaptive Monteiro-Svaiter Acceleration
https://scholar.archive.org/work/tzue7usegbgkpfy2re7e45oe6e
We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any p≥ 2 we improve the complexity of convex optimization with Lipschitz pth derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second- or first-order method by solving linear systems or applying MinRes, respectively. On logistic regression our method outperforms previous second-order momentum methods, but under-performs Newton's method; simply iterating our first-order adaptive subproblem solver performs comparably to L-BFGS.Yair Carmon, Danielle Hausler, Arun Jambulapati, Yujia Jin, Aaron Sidfordwork_tzue7usegbgkpfy2re7e45oe6eMon, 28 Nov 2022 00:00:00 GMT