5 Hits in 0.8 sec

Visformer: The Vision-friendly Transformer [article]

Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, Qi Tian
2021 arXiv   pre-print
Based on these observations, we propose a new architecture named Visformer, which is abbreviated from the 'Vision-friendly Transformer'.  ...  The past year has witnessed the rapid development of applying the Transformer module to vision problems.  ...  Integrating the observation above, we propose the Visformer as vision-friendly, Transformer-based models. The detailed architectures are shown in Table 5 .  ... 
arXiv:2104.12533v5 fatcat:tba7ik7fqzhvnou5mkvrkshg4e

Make A Long Image Short: Adaptive Token Length for Vision Transformers [article]

Yichen Zhu, Yuqin Zhu, Jie Du, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang
2021 arXiv   pre-print
The vision transformer splits each image into a sequence of tokens with fixed length and processes the tokens in the same way as words in natural language processing.  ...  Our approach is general and compatible with modern vision transformer architectures and can significantly reduce computational expanse.  ...  Visformer: The vision-friendly recognition, pages 3588–3597, 2018. 2 transformer. arXiv preprint arXiv:2104.12533, 2021. 7  ... 
arXiv:2112.01686v2 fatcat:fzenydaarjg3jffxjwv324qsa4

RegionViT: Regional-to-Local Attention for Vision Transformers [article]

Chun-Fu Chen, Rameswar Panda, Quanfu Fan
2021 arXiv   pre-print
Motivated by this, in this paper, we propose a new architecture that adopts the pyramid structure and employ a novel regional-to-local attention rather than global self-attention in vision transformers  ...  Vision transformer (ViT) has recently shown its strong capability in achieving comparable results to convolutional neural networks (CNNs) on image classification.  ...  To address the aforementioned computational limitations of vision transformers, in this work, we develop a memory-friendly and efficient self-attention method for transformer models to reach their promising  ... 
arXiv:2106.02689v2 fatcat:prc6dke6lvgaxjhtaiveqem7nu

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding [article]

Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister
2021 arXiv   pre-print
This observation leads us to design a simplified architecture that requires minor code changes upon the original vision transformer.  ...  Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.  ...  In CVPR. 11 Visformer: The Vision-friendly Transformer. arXiv preprint Kim, B.; Lee, J.; Kang, J.; Kim, E.-S.; and Kim, H.  ... 
arXiv:2105.12723v4 fatcat:mqji5xe6irgl7c473fnqchz2py

Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs [article]

Lauren M. Sanders
2021 arXiv   pre-print
In the next decade, the synthesis of artificial intelligence into the field of space biology will deepen the biological understanding of spaceflight effects, facilitate predictive modeling and analytics  ...  To advance these aims, the field leverages experiments, platforms, data, and model organisms from both spaceborne and ground-analog studies.  ...  Visformer: The Vision-friendly Transformer. arXiv [cs.CV] (2021). 64. Castro-Wallace, S. L. et al. Nanopore DNA Sequencing and Genome Assembly on the International Space Station. Sci.  ... 
arXiv:2112.12582v1 fatcat:qelzg32unnhd3j6ku7gdmrcbem