129,768 Hits in 6.1 sec

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention [article]

Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille
2021 arXiv   pre-print
Recently, self-attention operators have shown superior performance as a stand-alone building block for vision models.  ...  Our searched TrioNet that combines self-attention and convolution outperforms all stand-alone models with fewer FLOPs on ImageNet classification where self-attention performs better than convolution.  ...  The searched models are compared with stand-alone convolution [21] , stand-alone local selfattention [45] and stand-alone axial-attention [58] models.  ... 
arXiv:2111.07547v1 fatcat:d4iztfhzhvhulkx3ta2hl5kuvq

Local Multi-Head Channel Self-Attention for Facial Expression Recognition [article]

Roberto Pecoraro, Valerio Basile, Viviana Bono, Sara Gallo
2021 arXiv   pre-print
Since the Transformer architecture was introduced in 2017 there has been many attempts to bring the self-attention paradigm in the field of computer vision.  ...  In this paper we propose a novel self-attention module that can be easily integrated in virtually every convolutional neural network and that is specifically designed for computer vision, the LHC: Local  ...  But the main reason we didn't pursue the goal of a stand-alone architecture is that we don't believe in the main assumption spatial self-attention is based on in computer vision.  ... 
arXiv:2111.07224v2 fatcat:pmusv2efu5cspcz3q6jdkumbqe

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation [article]

Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
2020 arXiv   pre-print
We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet.  ...  In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions.  ...  The local constraint, proposed by the stand-alone self-attention models [68] , significantly reduces the computational costs in vision tasks and enables building fully self-attentional model.  ... 
arXiv:2003.07853v2 fatcat:5v2u47653janpexw4gherlhjea

Light-weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement

Lujun Li, Zhenxing Lu, Tobias Watzel, Ludwig Kürzinger, Gerhard Rigoll
2021 Electronics  
Our work is implemented in three phases: firstly, we apply the stand-alone self-attention layer in speech enhancement GANs.  ...  Secondly, we employ locality modeling on the stand-alone self-attention layer. Lastly, we investigate the functionality of the self-attention augmented convolutional speech enhancement GANs.  ...  stand-alone self-attention layer, modeling locality on the stand-alone self-attention layer, and coupling the self-attention layer with the (de)convolutional layer.  ... 
doi:10.3390/electronics10131586 fatcat:xvy4lh36cvhkrlkbg5ekt4atvi

AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via Visual Attention Condensers [article]

Alexander Wong, Mahmoud Famouri, Mohammad Javad Shafiee
2020 arXiv   pre-print
More specifically, AttendNets possess deep self-attention architectures based on visual attention condensers, which extends on the recently introduced stand-alone attention condensers to improve spatial-channel  ...  selective attention.  ...  [28] introduce the concept of attention condensers as a stand-alone building block for deep neural networks geared around condensed self-attention.  ... 
arXiv:2009.14385v1 fatcat:ur7ix4qzmfbzxjft5gnk7ukrzy

Involution: Inverting the Inherence of Convolution for Visual Recognition [article]

Duo Li, Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen
2021 arXiv   pre-print
We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation.  ...  Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision.  ...  Among these works, pure self-attention could be utilized to construct stand-alone models with promising performance.  ... 
arXiv:2103.06255v2 fatcat:u2pkdreijvcvbitqng3cptz25e

Contextual Transformer Networks for Visual Recognition [article]

Yehao Li and Ting Yao and Yingwei Pan and Tao Mei
2021 arXiv   pre-print
in numerous computer vision tasks.  ...  Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave  ...  [40, 58] present a stand-alone design of local self-attention module, which can completely replace the spatial convolutions in ResNet architectures.  ... 
arXiv:2107.12292v1 fatcat:ws7hou7o5rhtfkc7srs5s4t2gm

SimViT: Exploring a Simple Vision Transformer with sliding windows [article]

Gang Li, Di Xu, Xing Cheng, Lingyu Si, Changwen Zheng
2021 arXiv   pre-print
Specifically, we introduce Multi-head Central Self-Attention(MCSA) instead of conventional Multi-head Self-Attention to capture highly local relations.  ...  Although vision Transformers have achieved excellent performance as backbone models in many vision tasks, most of them intend to capture global relations of all tokens in an image or a window, which disrupts  ...  The removal of position encoding also Shlens, “Stand-alone self-attention in vision models,” brings the translation invariance, which is important for NeurIPS, 2019. the recognition ability  ... 
arXiv:2112.13085v1 fatcat:4ettjdumhfg7hi6vooix2xkzxq

Evaluating Transformers for Lightweight Action Recognition [article]

Raivo Koot, Markus Hennerbichler, Haiping Lu
2021 arXiv   pre-print
Meanwhile, attention-only models need more motion modeling capabilities and stand-alone attention block models currently incur too much latency overhead.  ...  In video action recognition, transformers consistently reach state-of-the-art accuracy. However, many models are too heavyweight for the average researcher with limited hardware resources.  ...  of stand-alone attention blocks.  ... 
arXiv:2111.09641v2 fatcat:77vh2hhypzcr3jz3bz4q47dnt4

Salient Object Detection Combining a Self-attention Module and a Feature Pyramid Network [article]

Guangyu Ren, Tianhong Dai, Panagiotis Barmpoutis, Tania Stathaki
2020 arXiv   pre-print
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model.  ...  To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy.  ...  The stand alone self-attention layer was introduced in the work of [23] .  ... 
arXiv:2004.14552v1 fatcat:vckdbn5osrhlhcuiyi7aecjram

Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks [article]

Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang
2020 arXiv   pre-print
Conventional approaches exploit the vision and language features in cross-modal grounding.  ...  In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the  ...  The language attention map for the baseline model and our final model. The x-axis stands for the position of words and the y-axis stands for the navigation time steps.  ... 
arXiv:1911.07883v4 fatcat:xsev6x5o4zhtpkbgxuhzvxfhvq

Salient Object Detection Combining a Self-Attention Module and a Feature Pyramid Network

Guangyu Ren, Tianhong Dai, Panagiotis Barmpoutis, Tania Stathaki
2020 Electronics  
In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model.  ...  To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy.  ...  The stand-alone self-attention layer was introduced in the work of [24] .  ... 
doi:10.3390/electronics9101702 fatcat:oytul2r6rzhbrgl4dejsp6e5xi

Unified Vision-Language Pre-Training for Image Captioning and VQA

Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason Corso, Jianfeng Gao
The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network.  ...  This paper presents a unified Vision-Language Pre-training (VLP) model.  ...  Each objective specifies different binary values in the self-attention mask to control what context is available to the language model.  ... 
doi:10.1609/aaai.v34i07.7005 fatcat:ue42mso77ncgloo32csk4vk7nm

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [article]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
2021 arXiv   pre-print
In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place.  ...  While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.  ...  Stand-alone self-attention in vision models. In NeurIPS, 2019. Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta.  ... 
arXiv:2010.11929v2 fatcat:myedumsklfcidim27uii6plwq4

Mathematical Modeling

Bernard P. Zeigler
1979 Computers and Mathematics with Applications  
As a text for an undergraduate course in mathematical modelling (for which it is intended), the book could not stand alone.  ...  Moreoever, most of the authors concentrate on the analysis of models, i.e. obtaining solutions, with relatively little attention paid to discussion of the general issues in their specific guises.  ... 
doi:10.1016/0898-1221(79)90045-2 fatcat:6ybhfi57lvb7lol3t5mohryeka
« Previous Showing results 1 — 15 out of 129,768 results