3,583 Hits in 6.0 sec

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [article]

Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H.S. Torr, Li Zhang
2021 arXiv   pre-print
In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task.  ...  With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR).  ...  In this work, we alternatively rethink the semantic segmentation task from a different perspective.  ... 
arXiv:2012.15840v3 fatcat:gymdaxywofaftaxt3zkyuxwbqe

UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer [article]

Haonan Wang, Peng Cao, Jiaqi Wang, Osmar R. Zaiane
2022 arXiv   pre-print
Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism.  ...  Most recent semantic segmentation methods adopt a U-Net framework with an encoder-decoder architecture.  ...  More specifically, we firstly propose a Channel-wise Cross Fusion Transformer (CCT) to fuse the multi-scale context with cross attention from the channel-wise perspective.  ... 
arXiv:2109.04335v3 fatcat:ssn3xixl7bffrkexqaloleiig4

Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation [article]

Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang
2021 arXiv   pre-print
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.  ...  First, an exhaustive set of object tracklets is constructed by propagating object masks detected from several sampled frames to the entire video.  ...  In this work, we rethink RVOS from a top-down perspective ( Fig. 1-(b) ), by comprehensively exploring cross-object relations and conducting object-level cross-modal grounding.  ... 
arXiv:2106.01061v1 fatcat:6jdazlbzsrbn7mzv4cp76pzlme

StructToken : Rethinking Semantic Segmentation with Structural Prior [article]

Fangjian Lin, Zhanhao Liang, Junjun He, Miao Zheng, Shengwei Tian, Kai Chen
2022 arXiv   pre-print
From a perspective on semantic segmentation as per-pixel classification, the previous deep learning-based methods learn the per-pixel representation first through an encoder and a decoder head and then  ...  classify each pixel representation to a specific category to obtain the semantic masks.  ...  Jumping out of the existing semantic segmentation frameworks, we rethink semantic segmentation tasks from a more anthropomorphic viewpoint.  ... 
arXiv:2203.12612v3 fatcat:kof2zssz6bfdfj2jjit6lm434i

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation [article]

Siyuan Qiao, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
2020 arXiv   pre-print
image sequences while providing each point with instance-level semantic interpretations.  ...  In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective  ...  Acknowledgments We would like to thank Maxwell Collins for the feedbacks, and the support from Google Mobile Vision team.  ... 
arXiv:2012.05258v1 fatcat:ybeeihjzsvdmdetwkntz4kvxsi

Visual Saliency Transformer [article]

Nian Liu and Ni Zhang and Kaiyuan Wan and Ling Shao and Junwei Han
2021 arXiv   pre-print
Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution.  ...  Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models.  ...  Different from previous CNN-based methods, we are the first to rethink SOD from a sequence-to-sequence perspective and propose a unified model based on pure transformer for both RGB and RGB-D SOD.  ... 
arXiv:2104.12099v2 fatcat:7ldvkm4lpvca5f3bl7doe2eogi

A Large-Scale Benchmark for Food Image Segmentation [article]

Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven C.H. Hoi, Qianru Sun
2021 arXiv   pre-print
In addition, we propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge.  ...  Existing food image segmentation models are underperforming due to two reasons: (1) there is a lack of high quality food image datasets with fine-grained ingredient labels and pixel-wise location masks  ...  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv preprint arXiv:2012.15840 (2020). Image Collection.  ... 
arXiv:2105.05409v1 fatcat:buqw4czjjrc53cghcvkqwa5c2m

TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images [article]

Jiangyun Li, Wenxuan Wang, Chen Chen, Tianxiang Zhang, Sen Zha, Jing Wang, Hong Yu
2022 arXiv   pre-print
With the proposed insight to redesign the internal structure of Transformer block and the introduced Deformable Bottleneck Module to capture shape-aware local details, a highly efficient architecture is  ...  Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline  ...  Rethinking semantic segmentation from a sequenceto-sequence perspective, SETR [17] leverages Transformer as the encoder for global feature extraction and achieves superior performance with large-scale  ... 
arXiv:2201.12785v2 fatcat:ifgcgjncfvhs7o4jbqrghzhkc4

Technology of Integrated Media Education

2018 Media Education (Mediaobrazovanie)  
From this perspective, the technology of integrated media education is interpreted as a tool to stimulate reflective-analytical experience, and meanwhile -the development of social and professional competences  ...  The efficiency of social and professional functions of a modern person is mediated not only by his/her knowledge, skills, and value orientations, but also by one's ability to capture, identify, and actualize  ...  often representing a relatively complete sequence, withdrawn from a larger media text.  ... 
doi:10.13187/me.2018.4.3 fatcat:w22cftjldfa3pevanhxjo3em3e

Semantic Coded Transmission: Architecture, Methodology, and Challenges [article]

Jincheng Dai, Ping Zhang, Kai Niu, Sixian Wang, Zhongwei Si, Xiaoqi Qin
2021 arXiv   pre-print
The recent concept of "semantic-driven" offers a promising research direction.  ...  In the future, communications toward intelligence and conciseness will predictably play a dominant role, and the proliferation of connected intelligent agents requires a radical rethinking of current coded  ...  Different from classical linear transforms, such as the Karhunen-Loève transform (KLT), which map the source vector into a latent space via a decorrelating invertible transform, the semantic analysis transform  ... 
arXiv:2112.03093v2 fatcat:er2uvmdtqrd27lwj66uyiv6iha

CPTR: Full Transformer Network for Image Captioning [article]

Wei Liu, Sihan Chen, Longteng Guo, Xinxin Zhu, Jing Liu
2021 arXiv   pre-print
In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input  ...  to Transformer.  ...  Inspired by the above works, we consider solving the image captioning task from a new sequence-to-sequence perspective and propose CPTR, a full Transformer network to replace the CNN in the encoder part  ... 
arXiv:2101.10804v3 fatcat:e3jbdxop7zdkxliuvikyu2ltoq

A Novel Data Analytics Oriented Approach for Image Representation Learning in Manufacturing Systems

Yue Liu, Junqi Ma, Xingzhen Tao, Jingyun Liao, Tao Wang, Jingjing Chen, Haidong Shao
2022 Journal of Sensors  
The TriLFrame is based on the hybrid architecture of Convolutional Network and Transformer.  ...  In this paper, we propose a novel self-supervised self-attention learning framework—TriLFrame for image representation learning.  ...  general perspective.  ... 
doi:10.1155/2022/1807103 fatcat:rlnobzugkfgnpdd6acxgsbz5ma

CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection [article]

Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
2022 arXiv   pre-print
In this work, we rethink these tasks from the perspective of global information alignment and transformation.  ...  CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism.  ...  Our contributions can be summarized as: • We introduce the transformer to rethink the bi-modal SOD modeling from a sequence-to-sequence perspective, which gains better interpretability. • We build a top-down  ... 
arXiv:2112.02363v2 fatcat:n5a53zf2cferlcydsuzfnpmexa

SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers [article]

Danfeng Hong and Zhu Han and Jing Yao and Lianru Gao and Bing Zhang and Antonio Plaza and Jocelyn Chanussot
2021 arXiv   pre-print
To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called SpectralFormer.  ...  Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings  ...  To solve this issue, we rethink HS image as accurate land cover mapping, precision agriculture, target classification from a sequential perspective  ... 
arXiv:2107.02988v2 fatcat:iw67o2iwhjafbhhrwogcswyk7u

Cooperation between reactive 3D objects and a multimodal X Window kernel for CAD [chapter]

Patrick Bourdot, Mike Krus, Rachid Gherbi
1998 Lecture Notes in Computer Science  
From the early steps of sketching to final engineering, a frequent and very important activity in designing objects is to perform graphical and spatial simulations to solve the constraints on the objects  ...  We have developed a prototype of a system where objects with reactive behaviour can be built, and with which the user can interact with a combination of graphical actions and vocal commands.  ...  Acknowledgements We wish to thank L. Arnal, J.P. Di Lelle, and F. Ledain for their work on MIX 3D and their help in producing the images for this chapter.  ... 
doi:10.1007/bfb0052319 fatcat:motjzhvqyjc2jgdprn34s5ijeu
« Previous Showing results 1 — 15 out of 3,583 results