Filters








246 Hits in 1.8 sec

How proofs are prepared at Camelot [article]

Andreas Björklund, Petteri Kaski
2016 arXiv   pre-print
We study a design framework for robust, independently verifiable, and workload-balanced distributed algorithms working on a common input. An algorithm based on the framework is essentially a distributed encoding procedure for a Reed--Solomon code, which enables (a) robustness against byzantine failures with intrinsic error-correction and identification of failed nodes, and (b) independent randomized verification to check the entire computation for correctness, which takes essentially no more
more » ... ources than each node individually contributes to the computation. The framework builds on recent Merlin--Arthur proofs of batch evaluation of Williams [ Electron.Colloq. Comput. Complexity, Report TR16-002, January 2016] with the observation that Merlin's magic is not needed for batch evaluation---mere Knights can prepare the proof, in parallel, and with intrinsic error-correction. The contribution of this paper is to show that in many cases the verifiable batch evaluation framework admits algorithms that match in total resource consumption the best known sequential algorithm for solving the problem. As our main result, we show that the k-cliques in an n-vertex graph can be counted and verified in per-node O(n^(ω+ϵ)k/6) time and space on O(n^(ω+ϵ)k/6) compute nodes, for any constant ϵ>0 and positive integer k divisible by 6, where 2≤ω<2.3728639 is the exponent of matrix multiplication. This matches in total running time the best known sequential algorithm, due to Nešetřil and Poljak [ Comment. Math. Univ. Carolin. 26 (1985) 415--419], and considerably improves its space usage and parallelizability. Further results include novel algorithms for counting triangles in sparse graphs, computing the chromatic polynomial of a graph, and computing the Tutte polynomial of a graph.
arXiv:1602.01295v1 fatcat:pqozv4nul5e7rakfbsra2al7pu

Homomorphic Hashing for Sparse Coefficient Extraction [article]

Petteri Kaski, Mikko Koivisto, Jesper Nederlof
2012 arXiv   pre-print
We study classes of Dynamic Programming (DP) algorithms which, due to their algebraic definitions, are closely related to coefficient extraction methods. DP algorithms can easily be modified to exploit sparseness in the DP table through memorization. Coefficient extraction techniques on the other hand are both space-efficient and parallelisable, but no tools have been available to exploit sparseness. We investigate the systematic use of homomorphic hash functions to combine the best of these
more » ... hods and obtain improved space-efficient algorithms for problems including LINEAR SAT, SET PARTITION, and SUBSET SUM. Our algorithms run in time proportional to the number of nonzero entries of the last segment of the DP table, which presents a strict improvement over sparse DP. The last property also gives an improved algorithm for CNF SAT with sparse projections.
arXiv:1203.4063v1 fatcat:in5o4y77zrc2boiljz777ybbiq

On evaluation of permanents [article]

Andreas Björklund and Thore Husfeldt and Petteri Kaski and Mikko Koivisto
2009 arXiv   pre-print
We study the time and space complexity of matrix permanents over rings and semirings.
arXiv:0904.3251v1 fatcat:ifj6d2zw2rgphltnll3pcfygla

Fast Monotone Summation over Disjoint Sets [article]

Petteri Kaski, Mikko Koivisto, Janne H. Korhonen
2012 arXiv   pre-print
We study the problem of computing an ensemble of multiple sums where the summands in each sum are indexed by subsets of size p of an n-element ground set. More precisely, the task is to compute, for each subset of size q of the ground set, the sum over the values of all subsets of size p that are disjoint from the subset of size q. We present an arithmetic circuit that, without subtraction, solves the problem using O((n^p+n^q) n) arithmetic gates, all monotone; for constant p, q this is within
more » ... he factor n of the optimal. The circuit design is based on viewing the summation as a "set nucleation" task and using a tree-projection approach to implement the nucleation. Applications include improved algorithms for counting heaviest k-paths in a weighted graph, computing permanents of rectangular matrices, and dynamic feature selection in machine learning.
arXiv:1208.0554v1 fatcat:vnklerszgnd35mewnmcr3mr2u4

The Shortest Even Cycle Problem is Tractable [article]

Andreas Björklund, Thore Husfeldt, Petteri Kaski
2021 arXiv   pre-print
Given a directed graph, we show how to efficiently find a shortest (directed, simple) cycle on an even number of vertices. As far as we know, no polynomial-time algorithm was previously known for this problem. In fact, finding any even cycle in a directed graph in polynomial time was open for more than two decades until Robertson, Seymour, and Thomas (Ann. of Math. (2) 1999) and, independently, McCuaig (Electron. J. Combin. 2004; announced jointly at STOC 1997) gave an efficiently testable
more » ... tural characterisation of even-cycle-free directed graphs. Methodologically, our algorithm relies on algebraic fingerprinting and randomized polynomial identity testing over a finite field, and uses a generating polynomial implicit in Vazirani and Yannakakis ( Discrete Appl. Math. 1989) that enumerates weighted cycle covers as a difference of a permanent and a determinant polynomial. The need to work with the permanent is where our main technical contribution occurs. We design a family of finite commutative rings of characteristic 4 that simultaneously (i) give a nondegenerate representation for the generating polynomial identity via the permanent and the determinant, (ii) support efficient permanent computations, and (iii) enable emulation of finite-field arithmetic in characteristic 2. Here our work is foreshadowed by that of Bj\"orklund and Husfeldt (SIAM J. Comput. 2019), who used a considerably less efficient ring design to obtain a polynomial-time algorithm for the shortest two disjoint paths problem. Building on work of Gilbert and Tarjan (Numer. Math. 1978) as well as Alon and Yuster (J. ACM 2013), we also show how ideas from the nested dissection technique for solving linear equation systems leads to faster algorithm designs when we have control on the separator structure of the input graph; for example, this happens when the input has bounded genus.
arXiv:2111.02992v1 fatcat:x6lyg2kkujfnpdltui2o2e3huu

Fourier meets Möbius: fast subset convolution [article]

Andreas Björklund, Thore Husfeldt, Petteri Kaski, Mikko Koivisto
2006 arXiv   pre-print
We present a fast algorithm for the subset convolution problem: given functions f and g defined on the lattice of subsets of an n-element set N, compute their subset convolution f*g, defined for all S⊆ N by (f * g)(S) = ∑_T ⊆ Sf(T) g(S∖ T), where addition and multiplication is carried out in an arbitrary ring. Via Möbius transform and inversion, our algorithm evaluates the subset convolution in O(n^2 2^n) additions and multiplications, substantially improving upon the straightforward O(3^n)
more » ... rithm. Specifically, if the input functions have an integer range -M,-M+1,...,M, their subset convolution over the ordinary sum-product ring can be computed in O^*(2^n log M) time; the notation O^* suppresses polylogarithmic factors. Furthermore, using a standard embedding technique we can compute the subset convolution over the max-sum or min-sum semiring in O^*(2^n M) time. To demonstrate the applicability of fast subset convolution, we present the first O^*(2^k n^2 + n m) algorithm for the minimum Steiner tree problem in graphs with n vertices, k terminals, and m edges with bounded integer weights, improving upon the O^*(3^k n + 2^k n^2 + n m) time bound of the classical Dreyfus-Wagner algorithm. We also discuss extensions to recent O^*(2^n)-time algorithms for covering and partitioning problems (Björklund and Husfeldt, FOCS 2006; Koivisto, FOCS 2006).
arXiv:cs/0611101v1 fatcat:zvrm7bgjwbew7mw5pocjriqsku

Fast Witness Extraction Using a Decision Oracle [article]

Andreas Björklund, Petteri Kaski, Łukasz Kowalik
2015 arXiv   pre-print
In terms of dependence on k, the currently fastest algorithm is due to Björklund, Husfeldt, Kaski, and Koivisto [1] and can be tuned to run in 1.66 k k O(1) m time.  ... 
arXiv:1508.03572v1 fatcat:alabv6psv5asrd4gkizznwa4ba

Dense Subset Sum may be the hardest [article]

Per Austrin, Mikko Koivisto, Petteri Kaski, Jesper Nederlof
2015 arXiv   pre-print
The Subset Sum problem asks whether a given set of n positive integers contains a subset of elements that sum up to a given target t. It is an outstanding open question whether the O^*(2^n/2)-time algorithm for Subset Sum by Horowitz and Sahni [J. ACM 1974] can be beaten in the worst-case setting by a "truly faster", O^*(2^(0.5-δ)n)-time algorithm, with some constant δ > 0. Continuing an earlier work [STACS 2015], we study Subset Sum parameterized by the maximum bin size β, defined as the
more » ... t number of subsets of the n input integers that yield the same sum. For every ϵ > 0 we give a truly faster algorithm for instances with β≤ 2^(0.5-ϵ)n, as well as instances with β≥ 2^0.661n. Consequently, we also obtain a characterization in terms of the popular density parameter n/_2 t: if all instances of density at least 1.003 admit a truly faster algorithm, then so does every instance. This goes against the current intuition that instances of density 1 are the hardest, and therefore is a step toward answering the open question in the affirmative. Our results stem from novel combinations of earlier algorithms for Subset Sum and a study of an extremal question in additive combinatorics connected to the problem of Uniquely Decodable Code Pairs in information theory.
arXiv:1508.06019v1 fatcat:tupcp43vyje5fc5hske2alzwre

There are 1,132,835,421,602,062,347 nonisomorphic one-factorizations of K_14 [article]

Petteri Kaski, Patric R. J. Östergård
2007 arXiv   pre-print
We establish by means of a computer search that a complete graph on 14 vertices has 98,758,655,816,833,727,741,338,583,040 distinct and 1,132,835,421,602,062,347 nonisomorphic one-factorizations. The enumeration is constructive for the 10,305,262,573 isomorphism classes that admit a nontrivial automorphism.
arXiv:0801.0202v1 fatcat:dvpu2hth2zbr5bljwk6un3efta

Narrow sieves for parameterized paths and packings [article]

Andreas Björklund, Thore Husfeldt, Petteri Kaski, Mikko Koivisto
2010 arXiv   pre-print
We present randomized algorithms for some well-studied, hard combinatorial problems: the k-path problem, the p-packing of q-sets problem, and the q-dimensional p-matching problem. Our algorithms solve these problems with high probability in time exponential only in the parameter (k, p, q) and using polynomial space; the constant bases of the exponentials are significantly smaller than in previous works. For example, for the k-path problem the improvement is from 2 to 1.66. We also show how to
more » ... tect if a d-regular graph admits an edge coloring with d colors in time within a polynomial factor of O(2^(d-1)n/2). Our techniques build upon and generalize some recently published ideas by I. Koutis (ICALP 2009), R. Williams (IPL 2009), and A. Björklund (STACS 2010, FOCS 2010).
arXiv:1007.1161v1 fatcat:sekbvppwsbgojennke6gqpasqi

Counting Paths and Packings in Halves [article]

Andreas Björklund and Thore Husfeldt and Petteri Kaski and Mikko Koivisto
2009 arXiv   pre-print
It is shown that one can count k-edge paths in an n-vertex graph and m-set k-packings on an n-element universe, respectively, in time n k/2 and n mk/2, up to a factor polynomial in n, k, and m; in polynomial space, the bounds hold if multiplied by 3^k/2 or 5^mk/2, respectively. These are implications of a more general result: given two set families on an n-element universe, one can count the disjoint pairs of sets in the Cartesian product of the two families with (n ℓ) basic operations, where ℓ
more » ... is the number of members in the two families and their subsets.
arXiv:0904.3093v1 fatcat:asuyur4l45d2tpsekdyx5cp2yy

Barriers and local minima in energy landscapes of stochastic local search [article]

Petteri Kaski
2006 arXiv   pre-print
A local search algorithm operating on an instance of a Boolean constraint satisfaction problem (in particular, k-SAT) can be viewed as a stochastic process traversing successive adjacent states in an "energy landscape" defined by the problem instance on the n-dimensional Boolean hypercube. We investigate analytically the worst-case topography of such landscapes in the context of satisfiable k-SAT via a random ensemble of satisfiable "k-regular" linear equations modulo 2. We show that for each
more » ... xed k=3,4,..., the typical k-SAT energy landscape induced by an instance drawn from the ensemble has a set of 2^Ω(n) local energy minima, each separated by an unconditional Ω(n) energy barrier from each of the O(1) ground states, that is, solution states with zero energy. The main technical aspect of the analysis is that a random k-regular 0/1 matrix constitutes a strong boundary expander with almost full GF(2)-linear rank, a property which also enables us to prove a 2^Ω(n) lower bound for the expected number of steps required by the focused random walk heuristic to solve typical instances drawn from the ensemble. These results paint a grim picture of the worst-case topography of k-SAT for local search, and constitute apparently the first rigorous analysis of the growth of energy barriers in a random ensemble of k-SAT landscapes as the number of variables n is increased.
arXiv:cs/0611103v1 fatcat:3kjsdqlf7zatlphryii7duulyq

Engineering Boolean Matrix Multiplication for Multiple-Accelerator Shared-Memory Architectures [article]

Matti Karppa, Petteri Kaski
2019 arXiv   pre-print
We study the problem of multiplying two bit matrices with entries either over the Boolean algebra (0,1,∨,∧) or over the binary field (0,1,+,·). We engineer high-performance open-source algorithm implementations for contemporary multiple-accelerator shared-memory architectures, with the objective of time-and-energy-efficient scaling up to input sizes close to the available shared memory capacity. For example, given two terabinary-bit square matrices as input, our implementations compute the
more » ... an product in approximately 2100 seconds (1.0 Pbop/s at 3.3 pJ/bop for a total of 2.1 kWh/product) and the binary product in less than 950 seconds (2.4 effective Pbop/s at 1.5 effective pJ/bop for a total of 0.92 kWh/product) on an NVIDIA DGX-1 with power consumption at peak system power (3.5 kW). Our contributions are (a) for the binary product, we use alternative-basis techniques of Karstadt and Schwartz [SPAA '17] to design novel alternative-basis variants of Strassen's recurrence for 2× 2 block multiplication [Numer. Math. 13 (1969)] that have been optimized for both the number of additions and low working memory, (b) structuring the parallel block recurrences and the memory layout for coalescent and register-localized execution on accelerator hardware, (c) low-level engineering of the innermost block products for the specific target hardware, and (d) structuring the top-level shared-memory implementation to feed the accelerators with data and integrate the results for input and output sizes beyond the aggregate memory capacity of the available accelerators.
arXiv:1909.01554v1 fatcat:njvf7t2nybhnnfpzjrqqxbxzta

Trimmed Moebius Inversion and Graphs of Bounded Degree [article]

Andreas Björklund, Thore Husfeldt, Petteri Kaski, Mikko Koivisto
2008 arXiv   pre-print
We study ways to expedite Yates's algorithm for computing the zeta and Moebius transforms of a function defined on the subset lattice. We develop a trimmed variant of Moebius inversion that proceeds point by point, finishing the calculation at a subset before considering its supersets. For an n-element universe U and a family F of its subsets, trimmed Moebius inversion allows us to compute the number of packings, coverings, and partitions of U with k sets from F in time within a polynomial
more » ... r (in n) of the number of supersets of the members of F. Relying on an intersection theorem of Chung et al. (1986) to bound the sizes of set families, we apply these ideas to well-studied combinatorial optimisation problems on graphs of maximum degree Δ. In particular, we show how to compute the Domatic Number in time within a polynomial factor of (2^Δ+1-2)^n/(Δ+1) and the Chromatic Number in time within a polynomial factor of (2^Δ+1-Δ-1)^n/(Δ+1). For any constant Δ, these bounds are O((2-ϵ)^n) for ϵ>0 independent of the number of vertices n.
arXiv:0802.2834v1 fatcat:sy3xys7v4rdzrcgbb2dpsrua2u

Sharper Upper Bounds for Unbalanced Uniquely Decodable Code Pairs [article]

Per Austrin, Petteri Kaski, Mikko Koivisto, Jesper Nederlof
2016 arXiv   pre-print
Two sets A, B ⊆{0, 1}^n form a Uniquely Decodable Code Pair (UDCP) if every pair a ∈ A, b ∈ B yields a distinct sum a+b, where the addition is over Z^n. We show that every UDCP A, B, with |A| = 2^(1-ϵ)n and |B| = 2^β n, satisfies β≤ 0.4228 +√(ϵ). For sufficiently small ϵ, this bound significantly improves previous bounds by Urbanke and Li [Information Theory Workshop '98] and Ordentlich and Shayevitz [2014, arXiv:1412.8415], which upper bound β by 0.4921 and 0.4798, respectively, as ϵ approaches 0.
arXiv:1605.00462v1 fatcat:bp6uutjjznan3l42c36vi7xcfq
« Previous Showing results 1 — 15 out of 246 results