Filters








19,997 Hits in 5.1 sec

Information Bottleneck Approach to Spatial Attention Learning [article]

Qiuxia Lai and Yu Li and Ailing Zeng and Minhao Liu and Hanqiu Sun and Qiang Xu
2021 arXiv   pre-print
To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training.  ...  This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy.  ...  by the 'Information Bottleneck (IB)' theory.  ... 
arXiv:2108.03418v1 fatcat:vcj4vhui6bb4papwcsrqan3kgu

Information Bottleneck Approach to Spatial Attention Learning

Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
2021 Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence   unpublished
To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training.  ...  This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy.  ...  by the 'Information Bottleneck (IB)' theory.  ... 
doi:10.24963/ijcai.2021/108 fatcat:5tt4zmsodrb7pnx5yv6edhfeiq

Attentional Bottleneck: Towards an Interpretable Deep Driving Network [article]

Jinkyu Kim, Mayank Bansal
2020 arXiv   pre-print
Our key idea is to combine visual attention, which identifies what aspects of the input the model is using, with an information bottleneck that enables the model to only use aspects of the input which  ...  In fact, we find slight improvements in accuracy when applying Attentional Bottleneck to the ChauffeurNet model, whereas we find that the accuracy deteriorates with a traditional visual attention model  ...  To generate sparser and more interpretable attention maps, we propose an architecture called Attentional Bottleneck (Fig. 1 ) that combines visual attention with the information bottleneck approach [  ... 
arXiv:2005.04298v1 fatcat:y5r2wzlatbgffba5bcowt2q33i

Multi-Person Pose Estimation with Enhanced Channel-wise and Spatial Information [article]

Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang
2019 arXiv   pre-print
Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps  ...  Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps  ...  Fig. 4 , 4 our Attention Residual Bottleneck learns the spatial attention weights β and the channel-wise attention weights α respectively.  ... 
arXiv:1905.03466v1 fatcat:mskqtosvyrh7boggztn5t6n65e

Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information

Kai Su, Dongdong Yu, Zhenqi Xu, Xin Geng, Changhu Wang
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Second, a Spatial, Channel-wise Attention Residual Bottleneck (SCARB) is designed to boost the original residual unit with attention mechanism, adaptively highlighting the information of the feature maps  ...  Although current approaches have achieved significant progress by fusing the multi-scale feature maps, they pay little attention to enhancing the channel-wise and spatial information of the feature maps  ...  Fig. 4 , 4 our Attention Residual Bottleneck learns the spatial attention weights β and the channel-wise attention weights α respectively.  ... 
doi:10.1109/cvpr.2019.00582 dblp:conf/cvpr/SuYXGW19 fatcat:hvzbfl3wxzgthex6jig4hufvyy

BAM: Bottleneck Attention Module [article]

Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon
2018 arXiv   pre-print
Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models.  ...  Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs.  ...  Conclusion We have presented the bottleneck attention module (BAM), a new approach to enhancing the representation power of a network.  ... 
arXiv:1807.06514v2 fatcat:klndxyxifvbo3dnd2j7ubyphfq

MPRNet: Multi-Path Residual Network for Lightweight Image Super Resolution [article]

Armin Mehri, Parichehr B.Ardakani, Angel D.Sappa
2020 arXiv   pre-print
Multi-Path Residual Network designs with a set of Residual concatenation Blocks stacked with Adaptive Residual Blocks: ($i$) to adaptively extract informative features and learn more expressive spatial  ...  The proposed architecture also contains a new attention mechanism, Two-Fold Attention Module, to maximize the representation ability of the model.  ...  Bottleneck Path: We design our Bottleneck path (BN) based on the following insights: i) Extract richer spatial information since spatial information is key importance in SR tasks; ii) prevent very wide  ... 
arXiv:2011.04566v1 fatcat:a77ljhho45dt5d4ahkb7hmh63e

Automatic salt deposits segmentation: A deep learning approach [article]

Mikhail Karchevskiy, Insaf Ashrapov, Leonid Kozinkin
2018 arXiv   pre-print
This problem is very important even nowadays due to it's non-linear nature.  ...  Using a U-Net with ResNeXt-50 encoder pre-trained on ImageNet as our base architecture, we implemented Spatial-Channel Squeeze & Excitation, Lovasz loss, CoordConv and Hypercolumn methods.  ...  ACKNOWLEDGMENT The authors would like to thank Open Data Science community [21] for many valuable discussions and educational help in the growing field of machine/deep learning.  ... 
arXiv:1812.01429v1 fatcat:i6n2h3vwvrapjd6bnlqg2mzrga

Multi-axis Attentive Prediction for Sparse EventData: An Application to Crime Prediction [article]

Yi Sui, Ga Wu, Scott Sanner
2021 arXiv   pre-print
We propose a purely attentional approach to extract both short-term dynamics and long-term semantics of event propagation through two observation angles.  ...  To overcome these sparsity issues, we present Multi-axis Attentive Prediction for Sparse Event Data (MAPSED).  ...  Besides, [22] proposes to use LSTM to encode the temporal correlations and then use an attention mechanism for spatial information fusion.  ... 
arXiv:2110.01794v1 fatcat:j7xdt3w2lvdlnideciexc3ti2e

DDaNet: Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images

Shih-Hung Yang, Wei-Ren Chen, Wun-Jhu Huang, Yon-Ping Chen
2020 IEEE Access  
The attention module leverage the inter-spatial relations in the depth feature map, thus learning the regions to emphasize in the spatial domain and refining the RGB and depth feature maps to focus on  ...  This approach guides the dual path network to learn gesture features from a large number of RGB-D images while suppressing the effect of color-depth misalignment. C.  ... 
doi:10.1109/access.2020.3046667 fatcat:3uby5tmufjddlokmtb6uy37u6q

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [article]

Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
2021 arXiv   pre-print
Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in  ...  This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial  ...  Acknowledgement We thank Dmitry Kalashnikov, Andy Zeng, and Robotics at Google NYC team members for valuable discussions on attention mechanisms.  ... 
arXiv:2106.11297v3 fatcat:7qyqihjjljfsvizrtrbrgwmtbm

Transformer-based Image Compression [article]

Ming Lu, Peiyao Guo, Huiqing Shi, Chuntong Cao, Zhan Ma
2021 arXiv   pre-print
Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised  ...  Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders  ...  Besides the encouraging coding efficiency, the proposed method consumes much less model parameters to the existing learning-based approaches, making the solution attractive to practical applications.  ... 
arXiv:2111.06707v1 fatcat:iikuywfbazh7ze5u2tkoyvxfoq

Real-time Semantic Segmentation with Context Aggregation Network [article]

Michael Ying Yang, Saumya Kumaar, Ye Lyu, Francesco Nex
2021 arXiv   pre-print
Building upon the existing dual branch architectures for high-speed semantic segmentation, we design a cheap high resolution branch for effective spatial detailing and a context branch with light-weight  ...  With regards to UAVid dataset, our proposed network achieves mIOU score of 63.5% with high execution speed (15 FPS).  ...  The bottleneck in the context branch allows for a deep supervision into the representational learning of the attention blocks. [40] .  ... 
arXiv:2011.00993v2 fatcat:qftm2mwbubhi3bgh5si34f5vza

DVMN: Dense Validity Mask Network for Depth Completion [article]

Laurenz Reichardt, Patrick Mangat, Oliver Wasenmüller
2021 arXiv   pre-print
To this end, we introduce a novel layer with spatially variant and content-depended dilation to include additional data from sparse input.  ...  Furthermore, we propose a sparsity invariant residual bottleneck block.  ...  We would also like to thank Dennis Teutscher for his support during the project.  ... 
arXiv:2107.06709v1 fatcat:sykaqvo7sngdblfkgavqfdeedi

DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis [article]

Tiange Xiang, Yang Song, Chaoyi Zhang, Dongnan Liu, Mei Chen, Fan Zhang, Heng Huang, Lauren O'Donnell, Weidong Cai
2021 arXiv   pre-print
We auto-encode the visual signals in each patch into a latent embedding vector representing local information, and down-sample the raw WSI to hardware-acceptable thumbnails representing regional information  ...  To address this issue, we posit that WSI analysis can be effectively conducted by integrating information at both high magnification (local) and low magnification (regional) levels.  ...  To better discriminate local features in V, we employ a max pooling layer to create spatial bottlenecks.  ... 
arXiv:2109.05788v1 fatcat:yayb23ba2beqjkh72aeptuwayq
« Previous Showing results 1 — 15 out of 19,997 results