A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
[article]
2021
arXiv
pre-print
Specifically, without skip connections or multi-layer perceptrons (MLPs), the output converges doubly exponentially to a rank-1 matrix. ...
This work proposes a new way to understand self-attention networks: we show that their output can be decomposed into a sum of smaller terms, each involving the operation of a sequence of attention heads ...
Jean-Baptiste Cordonnier is supported by the Swiss Data Science Center (SDSC). ...
arXiv:2103.03404v1
fatcat:bgnhkkfqjjezvff3lvpbxfnva4
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
[article]
2021
arXiv
pre-print
However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance. ...
These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance. ...
Attention is not all you need: pure
attention loses rank doubly exponentially with depth. ...
arXiv:2109.07152v1
fatcat:dxr5ej4xrfd4fpqi6sghcj6vhm
On Graph Neural Networks versus Graph-Augmented MLPs
[article]
2020
arXiv
pre-print
in depth. ...
From the perspective of graph isomorphism testing, we show both theoretically and numerically that GA-MLPs with suitable operators can distinguish almost all non-isomorphic graphs, just like the Weifeiler-Lehman ...
Acknowledgements We are grateful to Jiaxuan You for initiating the discussion on GA-MLP-type models, as well as Mufei Li, Minjie Wang, Xiang Song, Lingfan Yu, Michael M. ...
arXiv:2010.15116v2
fatcat:zdiirbcuevhrvkei6kpakuaro4
Multilinear formulas and skepticism of quantum computing
2004
Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04
Such a computer need not be universal; it might be specialized for (say) factoring. ...
If this is true, then there should be a natural set of quantum states that can account for all quantum computing experiments performed to date, but not for Shor's factoring algorithm. ...
First, EX [R] = 2 n+1 , so by a standard Hoeffding-type bound, Pr [R < 2 n ] is doubly-exponentially small in n. ...
doi:10.1145/1007352.1007378
dblp:conf/stoc/Aaronson04
fatcat:4r7rapmggzditdu5qeaqamj4sa
Multilinear Formulas and Skepticism of Quantum Computing
[article]
2004
arXiv
pre-print
If this is true, then there should be a natural set of quantum states that can account for all experiments performed to date, but not for Shor's factoring algorithm. ...
More broadly, we introduce a complexity classification of pure quantum states, and prove many basic facts about this classification. ...
First, EX [R] = 2 n+1 , so by a standard Hoeffding-type bound, Pr [R < 2 n ] is doubly-exponentially small in n. ...
arXiv:quant-ph/0311039v4
fatcat:vx2dmvcb6vhpflcvfndbdu56pe
Innovation Creates the Future when it Exemplifies Clear Strategic Thinking over Reacting to Presenting Complaints
2015
Strategic Management Quarterly
Below we list overriding questions that need addressing and answering regardless of what you are trying to accomplish or solve: ...
Addressing the correct questions, those formulated as real matters not statements of position, is the only way for the answers to matter. ...
Apply them if you can, but do not get discouraged by length and depth. ...
doi:10.15640/smq.v3n1a1
fatcat:ffhbkli7nvelvdfcxjeqk2rhdi
The intelligent use of space
1995
Artificial Intelligence
How we manage the spatial arrangement of items around us, is not an afterthought; it is an integral part of the way we think, plan and behave. ...
The objective of this essay is to provide the beginning of a principled classification of some of the ways space is intelligently used. ...
In the extreme case, we reduce a doubly exponential problem of deciding which piece to select and where to place it, into an exponential problem. ...
doi:10.1016/0004-3702(94)00017-u
fatcat:cbcfhzp2lrb5bhg6oprrolo7zy
Evolution of the big deals use in the public universities of the Castile and Leon region, Spain
2020
El Profesional de la Informacion
To Sunstein, a world where we are all reading our own Daily Me is one where "you need not come across topics and views that you have not sought out. ...
All you need to contribute to Wikipedia is Internet access: Every entry has an "Edit This Page" button on it, available to all. ...
-Business 2.0 "I'd put Anderson and his work on par with Malcolm Gladwell and Clayton M. ...
doi:10.3145/epi.2019.nov.19
fatcat:7hb7lt2ryrdt5o33xjjcoduuli
The chess of kinship and the kinship of chess
2011
HAU: Journal of Ethnographic Theory
In chess you start out with all your personnel there at once, ranked and ordered in a very specific way, and with some exceptions you proceed to diminish their numbers as the game progresses. ...
thinking is not what family behavior is all about. ...
doi:10.14318/hau1.1.006
fatcat:zjsmvp7gjvdtzexqt7dofa44gi
Limits on Efficient Computation in the Physical World
[article]
2005
arXiv
pre-print
the last because β < 1 and µ < 1/2, the n R 's increase doubly exponentially, and n 0 is sufficiently large. ...
But no, the Beast is there whenever you aren't paying attention, following all possible paths in superposition. Look, and suddenly the Beast is gone. But what does it even mean to look? ...
Let ε i = f * i − f * i ; then we need to show that ε i ≤ ε for all i ∈ {0, . . . , m}. The proof is by induction on i. ...
arXiv:quant-ph/0412143v2
fatcat:x6mjz4h4gzaszbfgbkshgm2v3u
On logics with two variables
1999
Theoretical Computer Science
Although the additional features are usually not first-order constructs, the resulting logics can still be seen as two-variable logics that are embedded in suitable extensions of FO*. ...
On the other side, the situation is different for model checking problems. ...
A doubly exponential bound on the size of a minimal model is implicit in Mot-timer's proof. ...
doi:10.1016/s0304-3975(98)00308-9
fatcat:krqnks7mfbdyfamtm4m7swdkry
The Complexity of Quantum States and Transformations: From Quantum Money to Black Holes
[article]
2016
arXiv
pre-print
The focus is quantum circuit complexity---i.e., the minimum number of gates needed to prepare a given quantum state or apply a given unitary transformation---as a unifying theme tying together several ...
The course was taught to a mixed audience of theoretical computer scientists and quantum gravity / string theorists, and starts out with a crash course on quantum information and computation in general ...
are doubly-exponentially small or even smaller. ...
arXiv:1607.05256v1
fatcat:mnpmspgwlrdk5pm3fcsthl3lui
SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers
[article]
2021
arXiv
pre-print
It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs. ...
More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow ...
need: Pure attention loses rank doubly exponentially with depth,” arXiv
Remote Sens., 2021. DOI: 10.1109/TGRS.2021.3055516. preprint arXiv:2103.03404, 2021.
[5] D. ...
arXiv:2107.02988v2
fatcat:iw67o2iwhjafbhhrwogcswyk7u
Multiagent systems: algorithmic, game-theoretic, and logical foundations
2009
ChoiceReviews
you access to the physical book; • The cost of the book is prohibitive for you; • You need only one or two chapters. ...
Finally, we ask you not to link directly to the PDF or to distribute it electronically. Instead, we invite you to link to http://www.masfoundations.org. ...
This sentence is not valid in the class of all merged Kripke structures defined earlier. ...
doi:10.5860/choice.46-5662
fatcat:pr2pmv7k2bad3pp5bxgogecgnq
Cosmology Beyond Einstein
[article]
2015
arXiv
pre-print
We describe these self-accelerating solutions and investigate the cosmological perturbations in depth, beginning with an investigation of their linear stability, followed by the construction of a method ...
Next, we discuss prospects for theories in which matter "doubly couples" to both metrics, and examine the cosmological expansion history in both massive gravity and bigravity with a specific double coupling ...
at all scales has been proven [73] , but is a sign that we need to continue to search for a doubly-coupled theory which is truly free of the Boulware-Deser ghost. ...
arXiv:1508.06859v1
fatcat:ehy6grlek5gh7hy3jsy3plpcpu
« Previous
Showing results 1 — 15 out of 828 results