A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Video Object Segmentation with Language Referring Expressions
[article]
2019
arXiv
pre-print
In this work we explore an alternative way of identifying a target object, namely by employing language referring expressions. ...
To evaluate our method we augment the popular video object segmentation benchmarks, DAVIS'16 and DAVIS'17 with language descriptions of target objects. ...
A Referring expressions for video object segmentation As our goal is to segment objects in videos using language specifications, we augment all objects annotated with mask labels in DAVIS 16 [38] ...
arXiv:1803.08006v3
fatcat:qzv4vpl4ojap3lriyexugtycby
Video Object Linguistic Grounding
2019
1st International Workshop on Multimodal Understanding and Learning for Embodied Applications - MULEA '19
Figure 1 : Example of the semi-supervised video object segmentation problem using language referring expressions from [3] ABSTRACT The goal of this work is segmenting on a video sequence the objects which ...
over the video frames, making the segmentation of the objects temporally consistent along the sequence. ...
EXPERIMENTAL RESULTS Here we present our video object segmentation results on the DAVIS17 dataset [5] with language referring expressions [3] . ...
doi:10.1145/3347450.3357662
fatcat:eoe5b3jf7jbbpkyvr724fsqt2y
SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
[article]
2021
arXiv
pre-print
dataset with synthetic referring expressions for video object segmentation. ...
Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. ...
We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used in this work. ...
arXiv:2106.04403v2
fatcat:huta6ela6zfe5k4r7jo7obpggy
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
[article]
2020
arXiv
pre-print
The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. ...
We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided ...
Experiments We report results with our model on two different tasks: language-guided image segmentation and language-guided video object segmentation. ...
arXiv:2010.00263v1
fatcat:iz2c2wrcrjfhbdfz4jsfdab34i
Language as Queries for Referring Video Object Segmentation
[article]
2022
arXiv
pre-print
Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames. ...
Concretely, we introduce a small set of object queries conditioned on the language as the input to the Transformer. In this manner, all the queries are obligated to find the referred objects only. ...
The model takes a video clip with the corresponding language expression as input and output the segmentation mask of the referred object in each frame. ...
arXiv:2201.00487v2
fatcat:uhk7jvi7uzbktd3ps6ty76qdbq
Localizing Moments in Video with Natural Language
[article]
2017
arXiv
pre-print
A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding ...
Therefore, we collect the Distinct Describable Moments (DiDeMo) dataset which consists of over 10,000 unedited, personal videos in diverse visual settings with pairs of localized video segments and referring ...
Datasets for natural language object retrieval include referring expressions which can uniquely localize a specific location in a image. ...
arXiv:1708.01641v1
fatcat:sgrv3qlhhfaujh6szkoxgwgmqa
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation
[article]
2022
arXiv
pre-print
Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos. ...
/referring expressions. ...
Related Work
Referring Image Segmentation Referring expression segmentation aims at precisely localizing the entity referred by a natural language expression with a pixel-level segmentation mask. ...
arXiv:2103.10702v3
fatcat:nmkubjdazvfrtpzx6ldtmzveia
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation
[article]
2021
arXiv
pre-print
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference. ...
First, an exhaustive set of object tracklets is constructed by propagating object masks detected from several sampled frames to the entire video. ...
Introduction Referring video object segmentation (RVOS) targets at segmenting video objects referred by given language expressions. ...
arXiv:2106.01061v1
fatcat:6jdazlbzsrbn7mzv4cp76pzlme
YouRefIt: Embodied Reference Understanding with Language and Gesture
[article]
2021
arXiv
pre-print
Of note, this new visual task requires understanding multimodal cues with perspective-taking to identify which object is being referred to. ...
We study the understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. ...
Videos are segmented into short clips, with each clip containing an exact one reference instance. For each clip, we annotate the reference target (object) with a bounding box. ...
arXiv:2109.03413v2
fatcat:32j2f7ea3vbwlfr2wi5z2d6vna
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
[article]
2022
arXiv
pre-print
Further, our Locater based solution achieved the 1st place in the Referring Video Object Segmentation Track of the 3rd Large-scale Video Object Segmentation Challenge. ...
We explore the task of language-guided video segmentation (LVS). ...
INTRODUCTION L ANGUAGE-GUIDED video segmentation (LVS) [1], also known as language-queried video actor segmentation [2] , aims to segment a specific object/actor in a video referred by a linguistic phrase ...
arXiv:2203.09773v1
fatcat:6u5mrlvg7rbithmv3xsdwfgqvi
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
[article]
2021
arXiv
pre-print
We further elaborate a new video dataset, GVG, that consists of challenging referring cases with far-ranging videos. ...
This work, on the other hand, investigates into a more general setting, generic visual grounding, aiming to mine all the objects satisfying the given expression, which is more challenging yet practical ...
Visual grounding task [46, 24, 41] is first put forward to refer objects in the image with expression in natural language. ...
arXiv:2103.10191v1
fatcat:yeuulvtpvzax3itpbkbac3rz64
Regia: a metadata editor for audiovisual documents
2007
Multimedia tools and applications
Regia allows the user to manually edit textual metadata and to hierarchically organize the segmentation of the audiovisual content. ...
An important feature of this metadata editor is that it is not hard-wired with a particular metadata attributes set; for this purpose the XML schema of the metadata model is used by the editor as configuration ...
By double-clicking on a keyframe associated with a segment (or a segment on the timeline) it is possible to follow the corresponding sub-Expression. ...
doi:10.1007/s11042-007-0129-4
fatcat:6dzhycszajak7mvtekjryo3ore
The Study of Subtitle Translation Based on Multi-Hierarchy Semantic Segmentation and Extraction in Digital Video
2017
Humanities and Social Sciences
of video object, on account of the consideration of each video object synchronization as well as temporal-spatial constraints related issues. ...
This paper established a reasonable and effective multi-hierarchy semantic information descriptive model based on video segmentation and extraction technology to realize the mapping of video semantic information ...
This thesis is the part achievements of 985 key construction disciplines of School of Foreign Languages of Xi 'an Jiaotong University. ...
doi:10.11648/j.hss.20170502.17
fatcat:oqmeelzcn5cebkdd5s2oh6uaua
Weak Supervision and Referring Attention for Temporal-Textual Association Learning
[article]
2020
arXiv
pre-print
The principle in our designed mechanism is to fully exploit 1) the weak supervision by considering informative and discriminative cues from intra-video segments anchored with the textual query, 2) multiple ...
The weak supervision is simply a textual expression (e.g., short phrases or sentences) at video level, indicating this video contains relevant frames. ...
[31] or object retrieval using language [16] . ...
arXiv:2006.11747v2
fatcat:bpqa6chthfgjhatmsgqq5t2dym
Cross-Modal Progressive Comprehension for Referring Segmentation
[article]
2021
arXiv
pre-print
Given a natural language expression and an image/video, the goal of referring segmentation is to produce the pixel-level masks of the entities described by the subject of the expression. ...
Combining CMPC-I or CMPC-V with TGFE can form our image or video version referring segmentation frameworks and our frameworks achieve new state-of-the-art performances on four referring image segmentation ...
Given a natural language expression and an image/video as inputs, the goal of referring segmentation is to segment the entities referred by the subject of the input expression. ...
arXiv:2105.07175v1
fatcat:z34rf37pnzgtbgcbcranimaqvy
« Previous
Showing results 1 — 15 out of 66,424 results