A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
[article]
2021
arXiv
pre-print
In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. ...
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). ...
In this work, we alternatively rethink the semantic segmentation task from a different perspective. ...
arXiv:2012.15840v3
fatcat:gymdaxywofaftaxt3zkyuxwbqe
UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-wise Perspective with Transformer
[article]
2022
arXiv
pre-print
Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism. ...
Most recent semantic segmentation methods adopt a U-Net framework with an encoder-decoder architecture. ...
More specifically, we firstly propose a Channel-wise Cross Fusion Transformer (CCT) to fuse the multi-scale context with cross attention from the channel-wise perspective. ...
arXiv:2109.04335v3
fatcat:ssn3xixl7bffrkexqaloleiig4
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation
[article]
2021
arXiv
pre-print
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference. ...
First, an exhaustive set of object tracklets is constructed by propagating object masks detected from several sampled frames to the entire video. ...
In this work, we rethink RVOS from a top-down perspective ( Fig. 1-(b) ), by comprehensively exploring cross-object relations and conducting object-level cross-modal grounding. ...
arXiv:2106.01061v1
fatcat:6jdazlbzsrbn7mzv4cp76pzlme
StructToken : Rethinking Semantic Segmentation with Structural Prior
[article]
2022
arXiv
pre-print
From a perspective on semantic segmentation as per-pixel classification, the previous deep learning-based methods learn the per-pixel representation first through an encoder and a decoder head and then ...
classify each pixel representation to a specific category to obtain the semantic masks. ...
Jumping out of the existing semantic segmentation frameworks, we rethink semantic segmentation tasks from a more anthropomorphic viewpoint. ...
arXiv:2203.12612v3
fatcat:kof2zssz6bfdfj2jjit6lm434i
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
[article]
2020
arXiv
pre-print
image sequences while providing each point with instance-level semantic interpretations. ...
In this paper, we present ViP-DeepLab, a unified model attempting to tackle the long-standing and challenging inverse projection problem in vision, which we model as restoring the point clouds from perspective ...
Acknowledgments We would like to thank Maxwell Collins for the feedbacks, and the support from Google Mobile Vision team. ...
arXiv:2012.05258v1
fatcat:ybeeihjzsvdmdetwkntz4kvxsi
Visual Saliency Transformer
[article]
2021
arXiv
pre-print
Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution. ...
Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models. ...
Different from previous CNN-based methods, we are the first to rethink SOD from a sequence-to-sequence perspective and propose a unified model based on pure transformer for both RGB and RGB-D SOD. ...
arXiv:2104.12099v2
fatcat:7ldvkm4lpvca5f3bl7doe2eogi
A Large-Scale Benchmark for Food Image Segmentation
[article]
2021
arXiv
pre-print
In addition, we propose a multi-modality pre-training approach called ReLeM that explicitly equips a segmentation model with rich and semantic food knowledge. ...
Existing food image segmentation models are underperforming due to two reasons: (1) there is a lack of high quality food image datasets with fine-grained ingredient labels and pixel-wise location masks ...
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv preprint arXiv:2012.15840 (2020). Image Collection. ...
arXiv:2105.05409v1
fatcat:buqw4czjjrc53cghcvkqwa5c2m
TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images
[article]
2022
arXiv
pre-print
With the proposed insight to redesign the internal structure of Transformer block and the introduced Deformable Bottleneck Module to capture shape-aware local details, a highly efficient architecture is ...
Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline ...
Rethinking semantic segmentation from a sequenceto-sequence perspective, SETR [17] leverages Transformer as the encoder for global feature extraction and achieves superior performance with large-scale ...
arXiv:2201.12785v2
fatcat:ifgcgjncfvhs7o4jbqrghzhkc4
Technology of Integrated Media Education
2018
Media Education (Mediaobrazovanie)
From this perspective, the technology of integrated media education is interpreted as a tool to stimulate reflective-analytical experience, and meanwhile -the development of social and professional competences ...
The efficiency of social and professional functions of a modern person is mediated not only by his/her knowledge, skills, and value orientations, but also by one's ability to capture, identify, and actualize ...
often representing a relatively complete sequence, withdrawn from a larger media text. ...
doi:10.13187/me.2018.4.3
fatcat:w22cftjldfa3pevanhxjo3em3e
Semantic Coded Transmission: Architecture, Methodology, and Challenges
[article]
2021
arXiv
pre-print
The recent concept of "semantic-driven" offers a promising research direction. ...
In the future, communications toward intelligence and conciseness will predictably play a dominant role, and the proliferation of connected intelligent agents requires a radical rethinking of current coded ...
Different from classical linear transforms, such as the Karhunen-Loève transform (KLT), which map the source vector into a latent space via a decorrelating invertible transform, the semantic analysis transform ...
arXiv:2112.03093v2
fatcat:er2uvmdtqrd27lwj66uyiv6iha
CPTR: Full Transformer Network for Image Captioning
[article]
2021
arXiv
pre-print
In this paper, we consider the image captioning task from a new sequence-to-sequence prediction perspective and propose CaPtion TransformeR (CPTR) which takes the sequentialized raw images as the input ...
to Transformer. ...
Inspired by the above works, we consider solving the image captioning task from a new sequence-to-sequence perspective and propose CPTR, a full Transformer network to replace the CNN in the encoder part ...
arXiv:2101.10804v3
fatcat:e3jbdxop7zdkxliuvikyu2ltoq
A Novel Data Analytics Oriented Approach for Image Representation Learning in Manufacturing Systems
2022
Journal of Sensors
The TriLFrame is based on the hybrid architecture of Convolutional Network and Transformer. ...
In this paper, we propose a novel self-supervised self-attention learning framework—TriLFrame for image representation learning. ...
general perspective. ...
doi:10.1155/2022/1807103
fatcat:rlnobzugkfgnpdd6acxgsbz5ma
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
[article]
2022
arXiv
pre-print
In this work, we rethink these tasks from the perspective of global information alignment and transformation. ...
CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. ...
Our contributions can be summarized as: • We introduce the transformer to rethink the bi-modal SOD modeling from a sequence-to-sequence perspective, which gains better interpretability. • We build a top-down ...
arXiv:2112.02363v2
fatcat:n5a53zf2cferlcydsuzfnpmexa
SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers
[article]
2021
arXiv
pre-print
To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called SpectralFormer. ...
Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings ...
To solve this issue, we rethink HS image as accurate land cover mapping, precision agriculture, target
classification from a sequential perspective ...
arXiv:2107.02988v2
fatcat:iw67o2iwhjafbhhrwogcswyk7u
Cooperation between reactive 3D objects and a multimodal X Window kernel for CAD
[chapter]
1998
Lecture Notes in Computer Science
From the early steps of sketching to final engineering, a frequent and very important activity in designing objects is to perform graphical and spatial simulations to solve the constraints on the objects ...
We have developed a prototype of a system where objects with reactive behaviour can be built, and with which the user can interact with a combination of graphical actions and vocal commands. ...
Acknowledgements We wish to thank L. Arnal, J.P. Di Lelle, and F. Ledain for their work on MIX 3D and their help in producing the images for this chapter. ...
doi:10.1007/bfb0052319
fatcat:motjzhvqyjc2jgdprn34s5ijeu
« Previous
Showing results 1 — 15 out of 3,583 results