A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
On the Relationship between Self-Attention and Convolutional Layers
[article]
2020
arXiv
pre-print
Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. ...
This raises the question: do learned attention layers operate similarly to convolutional layers? ...
In particular, we study the relationship between self-attention and convolution with quadratic and learned relative positional encodings. ...
arXiv:1911.03584v2
fatcat:c7vv2cztunep7go6s5qasusu34
[Re] On the Relationship between Self-Attention and Convolutional Layers
2021
Zenodo
The main highlights of the paper "On the Relationship between Self-Attention and Convolutional Layers" [10] are: 1 . ...
ReScience C 7.1 -Varma and Prabhu 2021
[Re] On the Relationship between Self-Attention and Convolutional Layers
https://github.com/epfml/attention-cnn
https://github.com/juho-lee/set_transformer ...
doi:10.5281/zenodo.5217601
fatcat:xqcpf32xvrflrp572fcpu23bq4
Transformers with convolutional context for ASR
[article]
2020
arXiv
pre-print
The proposed model achieves a competitive 4.7% and 12.9% WER on the Librispeech "test clean" and "test other" subsets when no extra LM text is provided. ...
These contextual representations provide subsequent transformer blocks with relative positional information needed for discovering long-range relationships between local concepts. ...
A Transformer layer distinguishes itself from a regular recurrent network by entirely relying on a key-value "self"-attention mechanism for learning relationships between distant concepts, rather than ...
arXiv:1904.11660v2
fatcat:olbwelda3fcghdjpf2yf3chc4m
CaEGCN: Cross-Attention Fusion based Enhanced Graph Convolutional Network for Clustering
[article]
2021
arXiv
pre-print
the data in a layer-by-layer manner, and the self-supervised model that highlights the discriminative information for clustering tasks. ...
Finally, the self-supervised module constrains the distributions of the middle layer representations of CAE and GAE to be consistent. ...
[3] constructed a self-expression layer between the encoder and decoder of the auto-encoder. ...
arXiv:2101.06883v1
fatcat:4vwvnnp7hfcq5ewkavlfam2phe
Research of Self-Attention in Image Segmentation
2022
Journal of Information Technology Research
And think about whether the self-attention module in this field can replace convolution operation in the future. ...
It turns out that self-attention can really solve this long-range dependency problem. This paper is a summary on the application of self-attention to image segmentation in the past two years. ...
The author uses the following equation (1) to calculate the relationship between x i and all other pixels x j . The f function models the relationship between x i and x j . ...
doi:10.4018/jitr.298619
fatcat:zbocv25bybh3znh7qov3uthed4
Self-organizing incremental and graph convolution neural network for English implicit discourse relation recognition
2021
EAI Endorsed Transactions on Scalable Information Systems
A classification model based on self-organizing incremental and graph convolutional neural network is constructed to obtain the argument representation which is helpful for English implicit discourse relation ...
To solve this problem, this paper proposes a self-organizing incremental and graph convolution neural network for English implicit discourse relation recognition. ...
Methodology The graph convolutional neural network (SIG) framework based on self-organizing increments and interactive attention proposed in this paper is shown in figure 2 .
Figure 2. ...
doi:10.4108/eai.22-11-2021.172215
fatcat:dx4swsvbtnbw3bww3cxz6ohs7q
Feedback Attention for Cell Image Segmentation
[article]
2020
arXiv
pre-print
We propose some Feedback Attentions which imitate human brain and feeds back the feature maps of output layer to close layer to the input. ...
Unlike conventional neural network models of feedforward processing, we focused on the feedback processing in human brain and assumed that the network learns like a human by connecting feature maps from ...
Feedback Attention using Source-Target-Attention We use Source-Target-Attention to aggregate the correlation between feature maps based on the relationship between input and output. ...
arXiv:2008.06474v1
fatcat:hepqyuplvvcwnjnthzk7335xai
Symmetric Dilated Convolution for Surgical Gesture Recognition
[article]
2020
arXiv
pre-print
We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly ...
and the F1@50 score ~6 points. ...
Acknowledgement The authors thank Bournemouth University PhD scholarship and Hengdaoruyi Company as well as the the Rabin Ezra Scholarship Trust for partly supported this research. ...
arXiv:2007.06373v2
fatcat:v3iw3ol7j5c7hbpzry6gr6q7ay
TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices
[article]
2020
arXiv
pre-print
An attention condenser is a self-attention mechanism that learns and produces a condensed embedding characterizing joint local and cross-channel activation relationships, and performs selective attention ...
learning on the edge and empowering TinyML applications. ...
Unlike self-attention mechanisms designed for deep convolutional neural networks that depend heavily on existing convolution modules, attention condensers act as self-contained, stand-alone modules and ...
arXiv:2008.04245v6
fatcat:4iajmayck5fhzdjikb44yfnscu
Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition
[article]
2022
arXiv
pre-print
And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. ...
However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and ...
The feature mapping layer is implemented by one convolution layers with BatchNorm and Leaky ReLU function. ...
arXiv:2201.02849v1
fatcat:unkyay3efrbdbk3wrstet6w7ji
Partial Correlation-Based Attention for Multivariate Time Series Forecasting
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Moreover, I propose data-driven series-wise multi-resolution convolutional layers to represent the input time-series data for domain agnostic learning. ...
In this study, I suggest partial correlation-based attention mechanism which overcomes the shortcomings of existing pair-wise comparisons-based attention mechanisms. ...
The filters are applied for each univariate time-series. From the convolution layers, the feature map can be obtained and each channel of the map is fed into the self-attention layers. ...
doi:10.1609/aaai.v34i10.7132
fatcat:jsxrqtm5lrcqri2n7ivtg5iaza
Hierarchical Convolutional Attention Networks for Text Classification
2018
Proceedings of The Third Workshop on Representation Learning for NLP
our method Hierarchical Convolutional Attention Networks. ...
We propose combining this approach with the benefits of convolutional filters and a hierarchical structure to create a document classification model that is both highly accurate and fast to train -we name ...
With two self-attentions, the first self-attention captures the relationship between 'it' and 'doesnt' and the second self-attention captures the relationship between 'it' and 'chop'. ...
doi:10.18653/v1/w18-3002
dblp:conf/rep4nlp/GaoRT18
fatcat:wstpnybs25hatfqctf2inhd4ya
AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via Visual Attention Condensers
[article]
2020
arXiv
pre-print
More specifically, AttendNets possess deep self-attention architectures based on visual attention condensers, which extends on the recently introduced stand-alone attention condensers to improve spatial-channel ...
Experimental results on ImageNet_50 benchmark dataset for the task of on-device image recognition showed that AttendNets have significantly lower architectural and computational complexity when compared ...
Many of the introduced self-attention mechanisms for augmenting deep convolutional neural network architectures have focused on the decoupling of attention into channel-wise attention [4] and local attention ...
arXiv:2009.14385v1
fatcat:ur7ix4qzmfbzxjft5gnk7ukrzy
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
[article]
2019
arXiv
pre-print
MUSE builds on MUSE-simple and explores combining convolution and self-attention for learning sequence representations from more different scales. ...
Although self-attention can model extremely long dependencies, the attention in deep layers tends to overconcentrate on a single token, leading to insufficient use of local information and difficultly ...
It seems that learning global and local context through stacking self-attention and convolution layers does not beat either self-attention or convolution models. ...
arXiv:1911.09483v1
fatcat:c2j2gwosn5bgddyalgknub4m7y
On estimating gaze by self-attention augmented convolutions
[article]
2020
arXiv
pre-print
Therefore we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network ...
The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. ...
Acknowledgments The authors acknowledge the National Laboratory for Scientific Computing ...
arXiv:2008.11055v2
fatcat:gnymoi3r7fhzvjptnedbn2t3yu
« Previous
Showing results 1 — 15 out of 30,878 results