442 Hits in 1.4 sec

Sparse Sinkhorn Attention [article]

Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan
2020 arXiv   pre-print
We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations.  ...  Attention method is competitive with vanilla attention and consistently outperforms recently proposed efficient Transformer models such as Sparse Transformers.  ...  Sparse Sinkhorn Attention The key idea of the Sparse Sinkhorn Attention is to operate on block sorted sequences.  ... 
arXiv:2002.11296v1 fatcat:rhsnvybbyjhrbgmnbofqszu6xi

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation [article]

Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia
2022 arXiv   pre-print
In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance.  ...  Figure 5 5 Figure 5 represents the attention matrix of sparse Sinkhorn transformer.  ...  Sinkhorn Sparse Transformer on voxel grids.  ... 
arXiv:2205.12602v1 fatcat:2zzjkeviyzcn7i6blttmjcxki4

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport [article]

Kyle Swanson, Lili Yu, Tao Lei
2020 arXiv   pre-print
Our model achieves very sparse rationale selections with high fidelity while preserving prediction accuracy compared to strong attention baseline models.  ...  Our model is end-to-end differentiable using the Sinkhorn algorithm for OT and can be trained without any alignment annotations.  ...  Sparse and constrained attention for neural machine translation.  ... 
arXiv:2005.13111v1 fatcat:px3x7mhxmvejrbtzqmefews5f4

Spartan: Differentiable Sparsity via Regularized Transportation [article]

Kai Sheng Tai, Taipeng Tian, Ser-Nam Lim
2022 arXiv   pre-print
On ImageNet-1K classification, Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training  ...  We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity.  ...  This is an intuitively reasonable property since the self-attention layer computes inner products of the query and key embeddings in order to construct attention maps.  ... 
arXiv:2205.14107v1 fatcat:et7ejsljjzgdbf4f24vjid7agy

Efficient Attentions for Long Document Summarization [article]

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, Lu Wang
2021 arXiv   pre-print
We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions.  ...  In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source.  ...  HEPOS attention with a Sinkhorn encoder covers more salient information.  ... 
arXiv:2104.02112v2 fatcat:g5z6bpbqu5eutd3ujc5xm4mc3m

Efficient Transformers: A Survey [article]

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
2022 arXiv   pre-print
Sinkhorn Transformers This section introduces the Sparse Sinkhorn Transformer (Tay et al., 2020b) . The Sinkhorn Transformer belongs to the family of learned patterns.  ...  Figure 4 : 4 Figure 4: Illustration of patterns of the attention matrix for dense self-attention in Transformers and sparse fixed attention in Sparse Transformers.  ... 
arXiv:2009.06732v3 fatcat:rxchuq3adrg3vlgn672pwd6evu

On Deep Unsupervised Active Learning

Changsheng Li, Handong Ma, Zhao Kang, Ye Yuan, Xiao-Yu Zhang, Guoren Wang
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Unsupervised active learning has attracted increasing attention in recent years, where its goal is to select representative samples in an unsupervised setting for human annotating.  ...  The representative models include Robust Sparse Coding (RSC) [Yang et al., 2011] and Correntropy based Sparse Representation (CESR) [He et al., 2011] .  ...  More recently, Wasserstein distance, derived from the Optimal Transport (OT) theory, has drawn ample attention in many machine learning tasks.  ... 
doi:10.24963/ijcai.2020/360 dblp:conf/ijcai/LuoPH20 fatcat:ffz3ibek6nbi7hewsqzdxtzlvu

Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure [article]

Jason M. Altschuler, Enric Boix-Adsera
2022 arXiv   pre-print
For structure (1), we recover the known result that Sinkhorn has poly(n,k) runtime; moreover, we provide the first poly(n,k) time algorithms for computing solutions that are exact and sparse.  ...  First, it enables us to show that the Sinkhorn algorithm, which is currently the most popular MOT algorithm, requires strictly more structure than other algorithms do to solve MOT in poly(n,k) time.  ...  In particular, this reduction does not work with SINKHORN because SINKHORN cannot compute sparse solutions. Corollary 7.11 (Efficient projection to the transportation polytope).  ... 
arXiv:2008.03006v4 fatcat:pmd3ghf5ozdw3k52mafykpf4ei

Efficient Transformers: A Survey

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
2022 ACM Computing Surveys  
The irst step is to project Q and K into a routing matrix R of dimensions n × d. 3. 2 . 2 11 Sinkhorn Transformers. This section introduces the Sparse Sinkhorn Transformer [74] .  ...  The key idea is to reduce the dense attention matrix to a sparse version by only computing attention on a sparse number of q i , k j pairs.  ... 
doi:10.1145/3530811 fatcat:sil36wgiz5c23hnr5iwzu26pwi

Completely Self-Supervised Crowd Counting via Distribution Matching [article]

Deepak Babu Sam, Abhinav Agarwalla, Jimmy Joseph, Vishwanath A. Sindagi, R. Venkatesh Babu, Vishal M. Patel
2020 arXiv   pre-print
A density regressor is first pretrained with self-supervision and then the distribution of predictions is matched to the prior by optimizing Sinkhorn distance between the two.  ...  dense ones to sparse and vice versa.  ...  Hence, developing methods to leverage the easily available unlabeled data, has gained attention in recent times.  ... 
arXiv:2009.06420v1 fatcat:jkxpqkfbt5drfiaootgdqvmbim

P^3-Net: Part Mobility Parsing from Point Cloud Sequences via Learning Explicit Point Correspondence

Yahao Shi, Xinyu Cao, Feixiang Lu, Bin Zhou
To obtain this matrix, an attention module is proposed to calculate the point correspondence.  ...  Moreover, we implement a Gumbel-Sinkhorn module to reduce the many-to-one relationship for better point correspondence.  ...  In the Gumbel-Sinkhorn module, the hyperparameter, τ , controls the matching matrix's sparse degree, and the lower temperature can improve the sparse degree.  ... 
doi:10.1609/aaai.v36i2.20122 fatcat:idf76prh5zcrtix56f64admixu

Learning to Match Features with Seeded Graph Matching Network [article]

Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, Long Quan
2021 arXiv   pre-print
Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant connectivity and learn compact representation.  ...  seed features and exchanges messages across images. 3) Attentional Unpooling, which propagates seed features back to original keypoints.  ...  Sparse sinkhorn attention. In ICML, 2020. 2 Shen, Tian Fang, and Long Quan.  ... 
arXiv:2108.08771v1 fatcat:b6fgvvs2yrccdjdvaseq4wiidq

Mapping in a cycle: Sinkhorn regularized unsupervised learning for point cloud shapes [article]

Lei Yang, Wenxi Liu, Zhiming Cui, Nenglun Chen, Wenping Wang
2020 arXiv   pre-print
In order to learn discriminative pointwise features from point cloud data, we incorporate in the formulation a regularization term based on Sinkhorn normalization to enhance the learned pointwise mappings  ...  We further propose a novel Sinkhorn regularization in the cycle consistency framework to enforce the learned pointwise features to be sparsely corresponded for different instances.  ...  Keypoints are a group of sparsely defined landmarks on a shape, crucial for many shape analysis applications.  ... 
arXiv:2007.09594v1 fatcat:5apqecvvy5fjdch3w3qitynori

ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching [article]

Yan Shi, Jun-Xiong Cai, Yoli Shavit, Tai-Jiang Mu, Wensen Feng, Kai Zhang
2022 arXiv   pre-print
Motivated by a prior observation that self- and cross- attention matrices converge to a sparse representation, we propose ClusterGNN, an attentional GNN architecture which operates on clusters for learning  ...  Graph Neural Networks (GNNs) with attention have been successfully applied for learning visual feature matching.  ...  Sparse attention in cluster-based feature matching.  ... 
arXiv:2204.11700v1 fatcat:5zi2py6dmbcyfeksimqdsd6h7q

DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization [article]

Ming Zhong, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng
2022 arXiv   pre-print
Furthermore, to process longer input, we augment the model with sparse attention which is combined with conventional attention in a hybrid manner.  ...  We introduce a hybrid attention approach in Transformer architecture: most layers are equipped with a sparse attention method (Sinkhorn attention) and the rest retain global self-attention.  ...  When dealing with long sequences, encoder selfattention accounts for the largest computational overhead, so we improve it with the recently proposed sparse Sinkhorn attention (Tay et al. 2020; Huang et  ... 
arXiv:2109.02492v2 fatcat:myapn6xzy5e2vju3jjncganfny
« Previous Showing results 1 — 15 out of 442 results