Filters








299,792 Hits in 3.3 sec

Object Level Visual Reasoning in Videos [article]

Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg Mori
2018 arXiv   pre-print
We propose a model capable of learning to reason about semantically meaningful spatiotemporal interactions in videos.  ...  The key to our approach is a choice of performing this reasoning at the object level through the integration of state of the art object detection networks.  ...  No object level reasoning is present in this baseline.  ... 
arXiv:1806.06157v3 fatcat:nanxepzterbqhmcnb3oemfndly

Visual Relationship Forecasting in Videos [article]

Li Mi, Yangjun Ou, Zhenzhong Chen
2021 arXiv   pre-print
To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner.  ...  In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer  ...  GCT can be divided into three parts: Feature Representation, Object-level Reasoning, and Frame-level Reasoning.  ... 
arXiv:2107.01181v1 fatcat:ep2hjklh5zdxpaahmmakn5m3fy

Tracklet Pair Proposal and Context Reasoning for Video Scene Graph Generation

Gayoung Jung, Jonghun Lee, Incheol Kim
2021 Sensors  
Video scene graph generation (ViDSGG), the creation of video scene graphs that helps in deeper and better visual scene understanding, is a challenging task.  ...  To effectively utilize the spatio-temporal context, low-level visual context reasoning is performed using a spatio-temporal context graph and a graph neural network as well as high-level semantic context  ...  However, both models only performed low-level visual context reasoning using the visual features of object tracklets. They did not perform the high-level semantic reasoning proposed in this study.  ... 
doi:10.3390/s21093164 pmid:34063299 pmcid:PMC8124611 fatcat:vtl2wizi5jfjdbqgxngtelwhwy

Video Semantic Content Analysis based on Ontology

Liang Bai, Songyang Lao, Gareth J.F. Jones, Alan F. Smeaton
2007 International Machine Vision and Image Processing Conference (IMVIP 2007)  
And low-level features (e.g. visual and aural) and video content analysis algorithms are integrated into the ontology to enrich video semantic analysis. OWL is used for the ontology description.  ...  New multimedia standards, such as MPEG-4 and MPEG-7, provide the basic functionalities in order to manipulate and transmit objects and metadata.  ...  Features and Algorithms According to definitions of the objects and sequences in the soccer domain and much observation of soccer video data, we found that the visual objects and sequences in soccer videos  ... 
doi:10.1109/imvip.2007.44 fatcat:nhjinluu2ncnfp2ebd4kr6jga4

Video Semantic Content Analysis based on Ontology

Liang Bai, Songyang Lao, Gareth J.F. Jones, Alan F. Smeaton
2007 International Machine Vision and Image Processing Conference (IMVIP 2007)  
And low-level features (e.g. visual and aural) and video content analysis algorithms are integrated into the ontology to enrich video semantic analysis. OWL is used for the ontology description.  ...  New multimedia standards, such as MPEG-4 and MPEG-7, provide the basic functionalities in order to manipulate and transmit objects and metadata.  ...  Features and Algorithms According to definitions of the objects and sequences in the soccer domain and much observation of soccer video data, we found that the visual objects and sequences in soccer videos  ... 
doi:10.1109/imvip.2007.13 fatcat:sn26dpuaofhepemdxn32i6tohq

Video Semantic Content Analysis Framework Based on Ontology Combined MPEG-7 [chapter]

Liang Bai, Songyang Lao, Weiming Zhang, Gareth J. F. Jones, Alan F. Smeaton
2008 Lecture Notes in Computer Science  
The rapid increase in the available amount of video data is creating a growing demand for efficient methods for understanding and managing it at the semantic level.  ...  Rules in Description Logic are defined to describe how low-level features and algorithms for video analysis should be applied according to different perception content.  ...  In the proposed video semantic content analysis framework, video analysis ontology is developed to formally describe the detection process of the video semantic content, in which the low-level visual and  ... 
doi:10.1007/978-3-540-79860-6_19 fatcat:nvvvs5pnkjfbhixtqorp2cu6nm

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering [article]

Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua
2022 arXiv   pre-print
In this work, we argue that while video is presented in frame sequence, the visual elements (e.g., objects, actions, activities and events) are not sequential but rather hierarchical in semantic space.  ...  in a level-wise manner, with the guidance of corresponding textual cues.  ...  To capture the insight, we propose to build video as a conditional graph hierarchy which level wisely reasons and aggregates low level visual resources into high level video elements, in which the language  ... 
arXiv:2112.06197v2 fatcat:t3ltekww4vcmjdrvkwhlkhm7cy

Video Question Answering: Datasets, Algorithms and Challenges [article]

Yaoyao Zhong, Wei Ji, Junbin Xiao, Yicong Li, Weihong Deng, Tat-Seng Chua
2022 arXiv   pre-print
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.  ...  It has earned increasing attention with recent research trends in joint vision and language understanding. Yet, compared with ImageQA, VideoQA is largely underexplored and progresses slowly.  ...  ., 2021a] propose a graph memory mechanism (HAIR), to perform relational vision-semantic reasoning from object level to frame level; [Peng et al., 2021] concatenate differentlevel graphs, that is, object-level  ... 
arXiv:2203.01225v1 fatcat:dn4sz5pomnfb7igvmxofangzsa

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering [article]

Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran
2021 arXiv   pre-print
This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time.  ...  High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their behavior and interactions.  ...  At the video level, we have a single OSTR unit that takes in the object sequence Y clip , query q, and video-level context c vid .  ... 
arXiv:2106.13432v2 fatcat:b4upxdws3ra6dftan7cy7igacm

Advanced Video-Based Surveillance

Luigi Di Stefano, Carlo Regazzoni, Dan Schonfeld
2011 EURASIP Journal on Image and Video Processing  
As also witnessed by recent advances on object/category recognition in the related field of computer vision, we believe that significant progress in low-level processing will be required to foster major  ...  This is an indication of the fact that within the surveillance community, improvement in the effectiveness and robustness of the computations devoted to extract elementary visual cues, upon which higher-level  ...  Finally, we wish to convey our deep gratitude to the Editor-in-Chief of the EURASIP Journal on Image and Video Processing, Professor Jean-Luc Dugelay, for his encouragement and support of this special  ... 
doi:10.1155/2011/857084 fatcat:p4zl3mspw5h6njvy6sduu6fuya

Character Matters: Video Story Understanding with Character-Aware Relations [article]

Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo
2020 arXiv   pre-print
This model specifically considers the characters in a video story, as well as the relations connecting different characters and objects.  ...  Video Story Question Answering (VSQA) offers an effective way to benchmark higher-level comprehension abilities of a model.  ...  Through visual relations, we capture two levels of visual semantics: the entity level and the relation level.  ... 
arXiv:2005.08646v1 fatcat:xuhhr4t6d5a2xly4muu4hc3mlm

Discriminative Latent Semantic Graph for Video Captioning [article]

Yang Bai, Junyan Wang, Yang Long, Bingzhang Hu, Yang Song, Maurice Pagnucco, Yu Guan
2021 arXiv   pre-print
information into latent object proposal. 2) Visual Knowledge: Latent Proposal Aggregation is proposed to dynamically extract visual words with higher semantic levels. 3) Sentence Validation: A novel Discriminative  ...  Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. 1) Enhanced Object Proposal: we propose a novel Conditional Graph that can fuse spatio-temporal  ...  [22] proposed a visual reasoning approach on videos over both space and time. Zhang et al.  ... 
arXiv:2108.03662v1 fatcat:6okzuqntjngcvl7cndxybjnjje

Hierarchical Conditional Relation Networks for Video Question Answering [article]

Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
2020 arXiv   pre-print
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts.  ...  over video.  ...  A stack of two sub-video CRNs now produces an output array of size (Q − 4)(T − 4)F , serving as an input object in an array of length M for the video-level CRNs.  ... 
arXiv:2002.10698v3 fatcat:nnagfm5fjrd4bjij2dyujmkete

Accommodating hybrid retrieval in a comprehensive video database management system

S.S.M. Chan, Qing Li, Yi Wu, Yueting Zhuang
2002 IEEE transactions on multimedia  
A comprehensive video retrieval system should be able to accommodate and utilize various (complementary) description data in facilitating effective retrieval.  ...  We also describe an experimental prototype being developed based on a commercial object-oriented toolkit using VC++ and Java.  ...  In VideoMAP + , the segment level is chosen as the direct bridge due to simplicity and efficiency reasons, because we regard video segments as the basic unit of retrieval.  ... 
doi:10.1109/tmm.2002.1017730 fatcat:il6of2keszelffo7k35bpfn4ne

Combining Deep Learning and Ontology to Reveal Video Sequences Semantics

Jemai Bornia, Frihida Ali
2021 Revue d'intelligence artificielle : Revue des Sciences et Technologies de l'Information  
The OWL file will be used afterward to exploit video documents in many ways such as modeling, indexing, querying, and feeding ontology editors to visualize, elucidate and reason on the semantic structure  ...  It is based on Deep Learning technology to identify objects and movements in video scenes. All extracted features are stored in an OWL file.  ...  This method eliminates the redundant and inessential images in a video without affecting the visual content of semantic details. Object detection.  ... 
doi:10.18280/ria.350204 fatcat:laf7ldsjnbeynacqmxmz5nipeu
« Previous Showing results 1 — 15 out of 299,792 results