Filters








225,501 Hits in 3.1 sec

Grounded Video Description [article]

Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach
2019 arXiv   pre-print
We achieve state-of-the-art performance on video description, video paragraph description, and image description and demonstrate our generated sentences are better grounded in the video.  ...  This allows training video description models with this data, and importantly, evaluate how grounded or "true" such model are to the video they describe.  ...  in generating grounded video descriptions.  ... 
arXiv:1812.06587v2 fatcat:ppozhsavcbdsdhdxkfufh5wn7u

Grounding Action Descriptions in Videos

Michaela Regneri, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, Manfred Pinkal
2013 Transactions of the Association for Computational Linguistics  
In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos.  ...  We present a general purpose corpus that aligns high quality videos with multiple natural language descriptions of the actions portrayed in the videos, together with an annotation of how similar the action  ...  We're indebted to Carl Vondrick and Marco Antonio Valenzuela Escrcega for their extensive support with the video annotation tool.  ... 
doi:10.1162/tacl_a_00207 fatcat:y5oskxq6l5bhnc37rkasy3wjrq

Video Object Grounding using Semantic Roles in Language Description [article]

Arka Sadhu, Kan Chen, Ram Nevatia
2020 arXiv   pre-print
We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.  ...  and grounding datasets.  ...  In this work, we address the task of Video Object Grounding (VOG): given a video and its natural language description we aim to localize each referred object.  ... 
arXiv:2003.10606v1 fatcat:fcxy2cvm4bczrf7l32v7m5dryu

Relational Graph Learning for Grounded Video Description Generation

Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haochen Shi, Jun Xiao, Yueting Zhuang, William Yang Wang
2020 Proceedings of the 28th ACM International Conference on Multimedia  
Grounded video description (GVD) encourages captioning models to attend to appropriate video regions (e.g., objects) dynamically and generate a description.  ...  Moreover, relational words (e.g., "jump left or right") are usual spatio-temporal inference results, i.e., these words cannot be grounded on certain spatial regions.  ...  Therefore, grounded video description (GVD) [53] , which tries to improve the grounding performance of captioning models, has been proposed.  ... 
doi:10.1145/3394171.3413746 dblp:conf/mm/ZhangWTSSXZW20 fatcat:z5srtrou7fe37ncnnwf666aazq

Video Object Grounding Using Semantic Roles in Language Description

Arka Sadhu, Kan Chen, Ram Nevatia
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.  ...  and grounding datasets.  ...  In this work, we address the task of Video Object Grounding (VOG): given a video and its natural language description we aim to localize each referred object.  ... 
doi:10.1109/cvpr42600.2020.01043 dblp:conf/cvpr/SadhuCN20 fatcat:qvqxhdt7pnglfn6r6q53b57ra4

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions [article]

Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem
2022 arXiv   pre-print
MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of videos and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets.  ...  available audio descriptions of mainstream movies.  ...  "untrimmed video" setup where the highly descriptive sentences are grounded in long-form videos.  ... 
arXiv:2112.00431v2 fatcat:gmpn22jdsfb55laxiltelf35wq

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos [article]

Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen
2019 arXiv   pre-print
The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos.  ...  To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy.  ...  for video description in ).  ... 
arXiv:1901.06829v1 fatcat:xcbgz3gv3zforewsvbykctovv4

Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description

Kai Shen, Lingfei Wu, Fangli Xu, Siliang Tang, Jun Xiao, Yueting Zhuang
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
The task of Grounded Video Description~(GVD) is to generate sentences whose objects can be grounded with the bounding boxes in the video frames.  ...  To address these issues, we cast the GVD task as a spatial-temporal Graph-to-Sequence learning problem, where we model video frames as spatial-temporal sequence graph in order to better capture implicit  ...  Introduction The task of Grounded video description (GVD) [Zhou et al., 2019] aims to generate more grounded and accurate descriptions by linking the generated words with the regions in video frames.  ... 
doi:10.24963/ijcai.2020/131 dblp:conf/ijcai/ShenWXT0Z20 fatcat:irwpzeaj3ncihhykuch3unmgxm

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos.  ...  To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy.  ...  for video description in ).  ... 
doi:10.1609/aaai.v33i01.33018393 fatcat:5urolqtse5dolmy7fplihjnmfq

Articles Transmit / Received Beamforming for Frequency Diverse Array with Symmetrical frequency offsets Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 1-6 (2017); View Description Detailed Analysis of Amplitude and Slope Diffraction Coefficients for knife-edge structure in S-UTD-CH Model Eray Arik, Mehmet Baris Tabakcioglu Adv. Sci. Technol. Eng. Syst. J. 2(3), 7-11 (2017); View Description Applications of Case Based Organizational Memory Supported by the PAbMM Architecture Martín, María de los ...

Jith Sarker, Abu Shami Md. Zadid Shifat, Rezoan Ahmed Shuvro
2017 Advances in Science, Technology and Engineering Systems  
In the modernistics years, Graphene has become a promising resplendence in the horizon of fabrication technology, due to some of its unique electronic properties like zero band gap, high saturation velocity, higher electrical conductivity and so on followed by extraordinary thermal, optical and mechanical properties such as-high thermal conductivity, optical transparency, flexibility and thinness. Graphene based devices demand to be deliberated as a possible option for post Si based fabrication
more » ... technology. In this paper, we have modelled a top gated graphene metal oxide semiconductor field effect transistor (MOSFET). Surface potential dependent Quantum capacitance is obtained selfconsistently along with linear and square root approximation model. Gate voltage dependence of surface potential has been analyzed with graphical illustrations and required mathematics as well. Output characteristics, transfer characteristics, transconductance (as a function of gate voltage) behavior have been investigated. In the end, effect of channel length on device performance has been justified. Variation of effective mobility and minimum carrier density with respect to channel length has also been observed. Considering all of the graphical illustrations, we do like to conclude that, graphene will be a successor in post silicon era and bring revolutionary changes in the field of fabrication technology.
doi:10.25046/aj0203177 fatcat:qhd2rteuajdqzo77fq74ixwevu

Grounding spatial prepositions for video search

Stefanie Tellex, Deb Roy
2009 Proceedings of the 2009 international conference on Multimodal interfaces - ICMI-MLMI '09  
This paper describes a framework for grounding the meaning of spatial prepositions in video.  ...  To evaluate these features, we collected a corpus of natural language descriptions about the motion of people in video clips.  ...  Descriptions which resolved to more than one ground object were excluded from the evaluation.  ... 
doi:10.1145/1647314.1647369 dblp:conf/icmi/TellexR09 fatcat:wydadfpqbvcqto7zyiaprfs72q

Human-centric Spatio-Temporal Video Grounding With Visual Transformers [article]

Zongheng Tang, Yue Liao, Si Liu, Guanbin Li, Xiaojie Jin, Hongxu Jiang, Qian Yu, Dong Xu
2021 arXiv   pre-print
description.  ...  In this work, we introduce a novel task - Humancentric Spatio-Temporal Video Grounding (HC-STVG).  ...  video grounding.  ... 
arXiv:2011.05049v2 fatcat:lfgpc7gsxvbbzdwqhhv3qgv4b4

Automated Textual Descriptions for a Wide Range of Video Events with 48 Human Actions [chapter]

Patrick Hanckmann, Klamer Schutte, Gertjan J. Burghouts
2012 Lecture Notes in Computer Science  
Presented is a hybrid method to generate textual descriptions of video based on actions. The method includes an action classifier and a description generator.  ...  The aim for the action classifier is to detect and classify the actions in the video, such that they can be used as verbs for the description generator.  ...  Experimental Setup The description generator is evaluated on 241 short videos (available at [1] ). For all videos ground truth is available.  ... 
doi:10.1007/978-3-642-33863-2_37 fatcat:dj55nmheajfm5jr3qymcvoaehy

Video Object Segmentation with Language Referring Expressions [article]

Anna Khoreva, Anna Rohrbach, Bernt Schiele
2019 arXiv   pre-print
To evaluate our method we augment the popular video object segmentation benchmarks, DAVIS'16 and DAVIS'17 with language descriptions of target objects.  ...  Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions.  ...  35.5 37.1 Table C3 reports C3 the effect of different grounding models, temporal consistency step for grounding and employing the first frame versus the full video descriptions on video object segmentation  ... 
arXiv:1803.08006v3 fatcat:qzv4vpl4ojap3lriyexugtycby

Ground Truth Generating Tool for Traffic Video Detector

Szymon Bigaj, Andrzej Głowacz, Jacek Kościow, Zbigniew Mikrut, Piotr Pawlik
2015 Image Processing & Communications  
The paper presents an application for generating ground truth data for the purposes of video detection and justifies its use in systems which analyze road traffic videos.  ...  The usefulness of described application in the development of video detection software is presented - especially during scene configuration and comparative analysis of video detection results versus ground  ...  Due to the variability of the ground truth observed e.g. during generating of ground truth for PETS04 video sequences three different individuals were involved in.The comprehensive description format of  ... 
doi:10.1515/ipc-2015-0031 fatcat:g7wohce42zbf7gubwmi2wdnj4q
« Previous Showing results 1 — 15 out of 225,501 results