A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
[article]
2021
arXiv
pre-print
Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models. ...
Our searched TrioNet that combines self-attention and convolution outperforms all stand-alone models with fewer FLOPs on ImageNet classification where self-attention performs better than convolution. ...
The searched models are compared with stand-alone convolution [21] , stand-alone local selfattention [45] and stand-alone axial-attention [58] models. ...
arXiv:2111.07547v1
fatcat:d4iztfhzhvhulkx3ta2hl5kuvq
Local Multi-Head Channel Self-Attention for Facial Expression Recognition
[article]
2021
arXiv
pre-print
Since the Transformer architecture was introduced in 2017 there has been many attempts to bring the self-attention paradigm in the field of computer vision. ...
In this paper we propose a novel self-attention module that can be easily integrated in virtually every convolutional neural network and that is specifically designed for computer vision, the LHC: Local ...
But the main reason we didn't pursue the goal of a stand-alone architecture is that we don't believe in the main assumption spatial self-attention is based on in computer vision. ...
arXiv:2111.07224v2
fatcat:pmusv2efu5cspcz3q6jdkumbqe
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
[article]
2020
arXiv
pre-print
We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. ...
In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. ...
The local constraint, proposed by the stand-alone self-attention models [68] , significantly reduces the computational costs in vision tasks and enables building fully self-attentional model. ...
arXiv:2003.07853v2
fatcat:5v2u47653janpexw4gherlhjea
Light-weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement
2021
Electronics
Our work is implemented in three phases: firstly, we apply the stand-alone self-attention layer in speech enhancement GANs. ...
Secondly, we employ locality modeling on the stand-alone self-attention layer. Lastly, we investigate the functionality of the self-attention augmented convolutional speech enhancement GANs. ...
stand-alone self-attention layer, modeling locality on the stand-alone self-attention layer, and coupling the self-attention layer with the (de)convolutional layer. ...
doi:10.3390/electronics10131586
fatcat:xvy4lh36cvhkrlkbg5ekt4atvi
AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via Visual Attention Condensers
[article]
2020
arXiv
pre-print
More specifically, AttendNets possess deep self-attention architectures based on visual attention condensers, which extends on the recently introduced stand-alone attention condensers to improve spatial-channel ...
selective attention. ...
[28] introduce the concept of attention condensers as a stand-alone building block for deep neural networks geared around condensed self-attention. ...
arXiv:2009.14385v1
fatcat:ur7ix4qzmfbzxjft5gnk7ukrzy
Involution: Inverting the Inherence of Convolution for Visual Recognition
[article]
2021
arXiv
pre-print
We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. ...
Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. ...
Among these works, pure self-attention could be utilized to construct stand-alone models with promising performance. ...
arXiv:2103.06255v2
fatcat:u2pkdreijvcvbitqng3cptz25e
Contextual Transformer Networks for Visual Recognition
[article]
2021
arXiv
pre-print
in numerous computer vision tasks. ...
Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave ...
[40, 58] present a stand-alone design of local self-attention module, which can completely replace the spatial convolutions in ResNet architectures. ...
arXiv:2107.12292v1
fatcat:ws7hou7o5rhtfkc7srs5s4t2gm
SimViT: Exploring a Simple Vision Transformer with sliding windows
[article]
2021
arXiv
pre-print
Specifically, we introduce Multi-head Central Self-Attention(MCSA) instead of conventional Multi-head Self-Attention to capture highly local relations. ...
Although vision Transformers have achieved excellent performance as backbone models in many vision tasks, most of them intend to capture global relations of all tokens in an image or a window, which disrupts ...
The removal of position encoding also Shlens, “Stand-alone self-attention in vision models,”
brings the translation invariance, which is important for NeurIPS, 2019.
the recognition ability ...
arXiv:2112.13085v1
fatcat:4ettjdumhfg7hi6vooix2xkzxq
Evaluating Transformers for Lightweight Action Recognition
[article]
2021
arXiv
pre-print
Meanwhile, attention-only models need more motion modeling capabilities and stand-alone attention block models currently incur too much latency overhead. ...
In video action recognition, transformers consistently reach state-of-the-art accuracy. However, many models are too heavyweight for the average researcher with limited hardware resources. ...
of stand-alone attention blocks. ...
arXiv:2111.09641v2
fatcat:77vh2hhypzcr3jz3bz4q47dnt4
Salient Object Detection Combining a Self-attention Module and a Feature Pyramid Network
[article]
2020
arXiv
pre-print
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model. ...
To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy. ...
The stand alone self-attention layer was introduced in the work of [23] . ...
arXiv:2004.14552v1
fatcat:vckdbn5osrhlhcuiyi7aecjram
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
[article]
2020
arXiv
pre-print
Conventional approaches exploit the vision and language features in cross-modal grounding. ...
In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the ...
The language attention map for the baseline model and our final model. The x-axis stands for the position of words and the y-axis stands for the navigation time steps. ...
arXiv:1911.07883v4
fatcat:xsev6x5o4zhtpkbgxuhzvxfhvq
Salient Object Detection Combining a Self-Attention Module and a Feature Pyramid Network
2020
Electronics
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model. ...
To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy. ...
The stand-alone self-attention layer was introduced in the work of [24] . ...
doi:10.3390/electronics9101702
fatcat:oytul2r6rzhbrgl4dejsp6e5xi
Unified Vision-Language Pre-Training for Image Captioning and VQA
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. ...
This paper presents a unified Vision-Language Pre-training (VLP) model. ...
Each objective specifies different binary values in the self-attention mask to control what context is available to the language model. ...
doi:10.1609/aaai.v34i07.7005
fatcat:ue42mso77ncgloo32csk4vk7nm
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
[article]
2021
arXiv
pre-print
In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. ...
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. ...
Stand-alone self-attention in vision models. In NeurIPS, 2019.
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. ...
arXiv:2010.11929v2
fatcat:myedumsklfcidim27uii6plwq4
Mathematical Modeling
1979
Computers and Mathematics with Applications
As a text for an undergraduate course in mathematical modelling (for which it is intended), the book could not stand alone. ...
Moreoever, most of the authors concentrate on the analysis of models, i.e. obtaining solutions, with relatively little attention paid to discussion of the general issues in their specific guises. ...
doi:10.1016/0898-1221(79)90045-2
fatcat:6ybhfi57lvb7lol3t5mohryeka
« Previous
Showing results 1 — 15 out of 129,768 results