828 Hits in 10.8 sec

Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth [article]

Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas
2021 arXiv   pre-print
Specifically, without skip connections or multi-layer perceptrons (MLPs), the output converges doubly exponentially to a rank-1 matrix.  ...  This work proposes a new way to understand self-attention networks: we show that their output can be decomposed into a sum of smaller terms, each involving the operation of a sequence of attention heads  ...  Jean-Baptiste Cordonnier is supported by the Swiss Data Science Center (SDSC).  ... 
arXiv:2103.03404v1 fatcat:bgnhkkfqjjezvff3lvpbxfnva4

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models [article]

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui
2021 arXiv   pre-print
However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance.  ...  These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance.  ...  Attention is not all you need: pure attention loses rank doubly exponentially with depth.  ... 
arXiv:2109.07152v1 fatcat:dxr5ej4xrfd4fpqi6sghcj6vhm

On Graph Neural Networks versus Graph-Augmented MLPs [article]

Lei Chen, Zhengdao Chen, Joan Bruna
2020 arXiv   pre-print
in depth.  ...  From the perspective of graph isomorphism testing, we show both theoretically and numerically that GA-MLPs with suitable operators can distinguish almost all non-isomorphic graphs, just like the Weifeiler-Lehman  ...  Acknowledgements We are grateful to Jiaxuan You for initiating the discussion on GA-MLP-type models, as well as Mufei Li, Minjie Wang, Xiang Song, Lingfan Yu, Michael M.  ... 
arXiv:2010.15116v2 fatcat:zdiirbcuevhrvkei6kpakuaro4

Multilinear formulas and skepticism of quantum computing

Scott Aaronson
2004 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing - STOC '04  
Such a computer need not be universal; it might be specialized for (say) factoring.  ...  If this is true, then there should be a natural set of quantum states that can account for all quantum computing experiments performed to date, but not for Shor's factoring algorithm.  ...  First, EX [R] = 2 n+1 , so by a standard Hoeffding-type bound, Pr [R < 2 n ] is doubly-exponentially small in n.  ... 
doi:10.1145/1007352.1007378 dblp:conf/stoc/Aaronson04 fatcat:4r7rapmggzditdu5qeaqamj4sa

Multilinear Formulas and Skepticism of Quantum Computing [article]

Scott Aaronson
2004 arXiv   pre-print
If this is true, then there should be a natural set of quantum states that can account for all experiments performed to date, but not for Shor's factoring algorithm.  ...  More broadly, we introduce a complexity classification of pure quantum states, and prove many basic facts about this classification.  ...  First, EX [R] = 2 n+1 , so by a standard Hoeffding-type bound, Pr [R < 2 n ] is doubly-exponentially small in n.  ... 
arXiv:quant-ph/0311039v4 fatcat:vx2dmvcb6vhpflcvfndbdu56pe

Innovation Creates the Future when it Exemplifies Clear Strategic Thinking over Reacting to Presenting Complaints

Robert W. Service, John K. McEwen
2015 Strategic Management Quarterly  
Below we list overriding questions that need addressing and answering regardless of what you are trying to accomplish or solve:  ...  Addressing the correct questions, those formulated as real matters not statements of position, is the only way for the answers to matter.  ...  Apply them if you can, but do not get discouraged by length and depth.  ... 
doi:10.15640/smq.v3n1a1 fatcat:ffhbkli7nvelvdfcxjeqk2rhdi

The intelligent use of space

David Kirsh
1995 Artificial Intelligence  
How we manage the spatial arrangement of items around us, is not an afterthought; it is an integral part of the way we think, plan and behave.  ...  The objective of this essay is to provide the beginning of a principled classification of some of the ways space is intelligently used.  ...  In the extreme case, we reduce a doubly exponential problem of deciding which piece to select and where to place it, into an exponential problem.  ... 
doi:10.1016/0004-3702(94)00017-u fatcat:cbcfhzp2lrb5bhg6oprrolo7zy

Evolution of the big deals use in the public universities of the Castile and Leon region, Spain

Andrés Fernández-Ramos, Blanca Rodríguez-Bravo, María-Luisa Alvite-Díez, Lourdes Santos-De-Paz, María-Antonia Morán-Suárez, Josefa Gallego-Lorenzo, Isabel Olea
2020 El Profesional de la Informacion  
To Sunstein, a world where we are all reading our own Daily Me is one where "you need not come across topics and views that you have not sought out.  ...  All you need to contribute to Wikipedia is Internet access: Every entry has an "Edit This Page" button on it, available to all.  ...  -Business 2.0 "I'd put Anderson and his work on par with Malcolm Gladwell and Clayton M.  ... 
doi:10.3145/epi.2019.nov.19 fatcat:7hb7lt2ryrdt5o33xjjcoduuli

The chess of kinship and the kinship of chess

2011 HAU: Journal of Ethnographic Theory  
In chess you start out with all your personnel there at once, ranked and ordered in a very specific way, and with some exceptions you proceed to diminish their numbers as the game progresses.  ...  thinking is not what family behavior is all about.  ... 
doi:10.14318/hau1.1.006 fatcat:zjsmvp7gjvdtzexqt7dofa44gi

Limits on Efficient Computation in the Physical World [article]

Scott Aaronson
2005 arXiv   pre-print
the last because β < 1 and µ < 1/2, the n R 's increase doubly exponentially, and n 0 is sufficiently large.  ...  But no, the Beast is there whenever you aren't paying attention, following all possible paths in superposition. Look, and suddenly the Beast is gone. But what does it even mean to look?  ...  Let ε i = f * i − f * i ; then we need to show that ε i ≤ ε for all i ∈ {0, . . . , m}. The proof is by induction on i.  ... 
arXiv:quant-ph/0412143v2 fatcat:x6mjz4h4gzaszbfgbkshgm2v3u

On logics with two variables

Erich Grädel, Martin Otto
1999 Theoretical Computer Science  
Although the additional features are usually not first-order constructs, the resulting logics can still be seen as two-variable logics that are embedded in suitable extensions of FO*.  ...  On the other side, the situation is different for model checking problems.  ...  A doubly exponential bound on the size of a minimal model is implicit in Mot-timer's proof.  ... 
doi:10.1016/s0304-3975(98)00308-9 fatcat:krqnks7mfbdyfamtm4m7swdkry

The Complexity of Quantum States and Transformations: From Quantum Money to Black Holes [article]

Scott Aaronson
2016 arXiv   pre-print
The focus is quantum circuit complexity---i.e., the minimum number of gates needed to prepare a given quantum state or apply a given unitary transformation---as a unifying theme tying together several  ...  The course was taught to a mixed audience of theoretical computer scientists and quantum gravity / string theorists, and starts out with a crash course on quantum information and computation in general  ...  are doubly-exponentially small or even smaller.  ... 
arXiv:1607.05256v1 fatcat:mnpmspgwlrdk5pm3fcsthl3lui

SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers [article]

Danfeng Hong and Zhu Han and Jing Yao and Lianru Gao and Bing Zhang and Antonio Plaza and Jocelyn Chanussot
2021 arXiv   pre-print
It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs.  ...  More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow  ...  need: Pure attention loses rank doubly exponentially with depth,” arXiv Remote Sens., 2021. DOI: 10.1109/TGRS.2021.3055516. preprint arXiv:2103.03404, 2021. [5] D.  ... 
arXiv:2107.02988v2 fatcat:iw67o2iwhjafbhhrwogcswyk7u

Multiagent systems: algorithmic, game-theoretic, and logical foundations

2009 ChoiceReviews  
you access to the physical book; • The cost of the book is prohibitive for you; • You need only one or two chapters.  ...  Finally, we ask you not to link directly to the PDF or to distribute it electronically. Instead, we invite you to link to  ...  This sentence is not valid in the class of all merged Kripke structures defined earlier.  ... 
doi:10.5860/choice.46-5662 fatcat:pr2pmv7k2bad3pp5bxgogecgnq

Cosmology Beyond Einstein [article]

Adam R. Solomon
2015 arXiv   pre-print
We describe these self-accelerating solutions and investigate the cosmological perturbations in depth, beginning with an investigation of their linear stability, followed by the construction of a method  ...  Next, we discuss prospects for theories in which matter "doubly couples" to both metrics, and examine the cosmological expansion history in both massive gravity and bigravity with a specific double coupling  ...  at all scales has been proven [73] , but is a sign that we need to continue to search for a doubly-coupled theory which is truly free of the Boulware-Deser ghost.  ... 
arXiv:1508.06859v1 fatcat:ehy6grlek5gh7hy3jsy3plpcpu
« Previous Showing results 1 — 15 out of 828 results