Filters








6,206 Hits in 5.3 sec

Joint Video and Text Parsing for Understanding Events and Answering Queries [article]

Kewei Tu, Meng Meng, Mun Wai Lee, Tae Eun Choe, Song-Chun Zhu
2014 arXiv   pre-print
Our framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events) and causal information (causalities  ...  We propose a framework for parsing video and text jointly for understanding events and answering user queries.  ...  We want to thank Mingtian Zhao, Yibiao Zhao, Ping Wei, Amy Morrow, Mohamed R. Amer, Dan Xie and Sinisa Todorovic for their help in automatic video parsing.  ... 
arXiv:1308.6628v2 fatcat:7y5wjg3irrgodbeal6hmoz32ea

Joint Video and Text Parsing for Understanding Events and Answering Queries

2014 IEEE Multimedia  
We propose a multimedia analysis framework to process video and text jointly for understanding events and answering user queries.  ...  Our framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events) and causal information (causalities  ...  We want to thank Mingtian Zhao, Yibiao Zhao, Ping Wei, Amy Morrow, Mohamed R. Amer, Dan Xie and Sinisa Todorovic for their help in automatic video parsing.  ... 
doi:10.1109/mmul.2014.29 fatcat:bunjekcxezhffkinjx2zet2afm

Visual body form and orientation cues do not modulate visuo-tactile temporal integration [article]

Sophie Smit, Anina N Rich, Regine Zopf
2019 bioRxiv   pre-print
This suggests that a Bayesian causal inference model for the integration of bodily signals fits findings in the spatial, but not the temporal domain.  ...  This specifies that the brain integrates spatial and temporal signals coming from different modalities when it infers a common cause for inputs.  ...  Evidence for a Bayesian causal inference framework for the integration of bodily signals therefore comes predominantly from the spatial domain (see Table 1 for findings regarding temporal and spatial  ... 
doi:10.1101/647594 fatcat:js7zjpjaurgjdplvjaih2jztqu

Visual body form and orientation cues do not modulate visuo-tactile temporal integration

Sophie Smit, Anina N. Rich, Regine Zopf, Matthew Longo
2019 PLoS ONE  
This specifies that the brain integrates spatial and temporal signals coming from different modalities when it infers a common cause for inputs.  ...  One approach to model multisensory integration that has been influential in the multisensory literature is Bayesian causal inference.  ...  Bayesian causal inference models propose that evidence for a common source increases the degree of both spatial and temporal integration of multisensory inputs.  ... 
doi:10.1371/journal.pone.0224174 pmid:31841510 pmcid:PMC6913941 fatcat:t2jju63w7jgqfkzoqrfdayjwxe

Causal Discovery in Physical Systems from Videos [article]

Yunzhu Li, Antonio Torralba, Animashree Anandkumar, Dieter Fox, Animesh Garg
2020 arXiv   pre-print
Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution  ...  We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure.  ...  Inferring the directed edge set of the Causal Summary Graph After we obtain the keypoints from the images, we use an inference module to discover the edge set of the causal summary graph and infer the  ... 
arXiv:2007.00631v3 fatcat:n3b34ebalrhyfazf7fwwug4eyy

A comprehensive study of visual event computing

WeiQi Yan, Declan F. Kieran, Setareh Rafatirad, Ramesh Jain
2010 Multimedia tools and applications  
We start by presenting events and their classifications, and continue with discussing the problem of capturing events in terms of photographs, videos, etc, as well as the methodologies for event storing  ...  Finally, we suggest future research trends in event computing and hope to introduce a comprehensive profile of visual event computing to readers.  ...  This work was partially supported by QUB research project: Unusual event detection in audio-visual surveillance for public transport (NO.D6223EEC).  ... 
doi:10.1007/s11042-010-0560-9 fatcat:ak6u3eefefgjhmbpr7asru3n7u

Video Processing Via Implicit and Mixture Motion Models

Xin Li
2007 IEEE transactions on circuits and systems for video technology (Print)  
Using mixture models, we show how to probabilistically fuse the inference results obtained from virtual cameras in order to achieve spatio-temporal adaptation.  ...  , video coding, and temporal interpolation.  ...  ACKNOWLEDGMENT The author wants to thank Y. Zheng for his help with the implementation on video denoising and error-concealment algorithms.  ... 
doi:10.1109/tcsvt.2007.896656 fatcat:lfveb62mzvb5dkvwbiauckbx6a

Efficient Subframe Video Alignment Using Short Descriptors

G. D. Evangelidis, C. Bauckhage
2013 IEEE Transactions on Pattern Analysis and Machine Intelligence  
In addition, we extend the recently introduced ECC image-alignment algorithm to the temporal dimension that allows for spatial registration and synchronization refinement with subframe accuracy.  ...  Index Terms-Video synchronization, spatio-temporal alignment, image/video retrieval, short image descriptors  ...  ACKNOWLEDGEMENTS We thank the ADAS Group of the Computer Vision Center (CVC) in Barcelona (Spain) for data sharing and, especially, Ferran Diego for discussions.  ... 
doi:10.1109/tpami.2013.56 pmid:23969383 fatcat:n36uzzpqrne5vff5lwggdkjmya

Streaming Multiscale Deep Equilibrium Models [article]

Can Ufuk Ertenli, Emre Akbas, Ramazan Gokberk Cinbis
2022 arXiv   pre-print
For this purpose, we leverage the recently emerging implicit layer model which infers the representation of an image by solving a fixed-point problem.  ...  We present StreamDEQ, a method that infers frame-wise representations on videos with minimal per-frame computation.  ...  At each key-frame they process the full input but for intermediate frames, they update the feature maps partially based on temporal consistency.  ... 
arXiv:2204.13492v1 fatcat:vaiyarxqgrbqrpc6jfr6hv2c74

Utilizing Temporal Information in Deep Convolutional Network for Efficient Soccer Ball Detection and Tracking [article]

Anna Kukleva, Mohammad Asif Khan, Hafez Farazi, Sven Behnke
2019 arXiv   pre-print
We first solve the detection task for an image using fully convolutional encoder-decoder architecture, and later, we use it as an input to our temporal models and jointly learn the detection task in sequences  ...  In contrast to the existing methods where only the current frame or an image is used for the detection, we make use of the history of frames.  ...  spatial domain -hence making FCN architecture a natural choice for pixelwise problems like object localization or image segmentation.  ... 
arXiv:1909.02406v2 fatcat:ea2gtaesjzc5dd4eb2oef4cavm

Robot learning with a spatial, temporal, and causal and-or graph

Caiming Xiong, Nishant Shukla, Wenlong Xiong, Song-Chun Zhu
2016 2016 IEEE International Conference on Robotics and Automation (ICRA)  
It unifies both knowledge representation and action planning in the same hierarchical data structure, allowing a robot to expand its spatial, temporal, and causal knowledge at varying levels of abstraction  ...  We propose a stochastic graph-based framework for a robot to understand tasks from human demonstrations and perform them with feedback control.  ...  In addition, we would like to thank SRI International and OSRF for their support.  ... 
doi:10.1109/icra.2016.7487364 dblp:conf/icra/XiongSXZ16 fatcat:agdxx2bwbvcf5l43dkldvmt42m

Video-Based Human Behavior Understanding: A Survey

Paulo Vinicius Koerich Borges, Nicola Conci, Andrea Cavallaro
2013 IEEE transactions on circuits and systems for video technology (Print)  
The advantages and the drawbacks of the methods are critically discussed, providing a comprehensive coverage of key aspects of video-based human behavior understanding, available datasets for experimentation  ...  Human behavior understanding combines image and signal processing, feature extraction, machine learning and 3D geometry.  ...  Full annotation is provided, with information on the type of action and related spatial and temporal position in the video.  ... 
doi:10.1109/tcsvt.2013.2270402 fatcat:ilpqptjrhfacjasyyw6wfug7ia

Massively Parallel Video Networks [chapter]

João Carreira, Viorica Pătrăucean, Laurent Mazare, Andrew Zisserman, Simon Osindero
2018 Lecture Notes in Computer Science  
We introduce a class of causal video understanding models that aims to improve efficiency of video processing by maximising throughput, minimising latency, and reducing the number of clock cycles.  ...  We illustrate the proposed principles by applying them to existing image architectures and analyse their behaviour on two video tasks: action recognition and human keypoint localisation.  ...  Acknowledgements: We thank Carl Doersch, Relja Arandjelovic, Evan Shelhamer, and Dominic Grewe for valuable discussions and feedback on this work, and Tom Runia for finding typos in our architecture specification  ... 
doi:10.1007/978-3-030-01225-0_40 fatcat:bxpmllapyrc7divjz45bkplucq

Nested Event Model for Multimedia Narratives

Ricardo Rios M. do Carmo, Luiz F.G. Soares, Marco Antonio Casanova
2013 2013 IEEE International Symposium on Multimedia  
To address this issue, a strategy is to offer users efficient search mechanisms, sometimes based on ontologies.  ...  The proliferation of multimedia narratives has contributed to what is known as the "crisis of choice", which demands a much more active participation on the part of the user to consume multimedia content  ...  In addition to temporal and spatial relationships, the causal aspect covers other causal relationships between events.  ... 
doi:10.1109/ism.2013.26 dblp:conf/ism/CarmoSC13 fatcat:ia2ptrvhvvamvporznngurq5dq

Video based activity recognition in trauma resuscitation

Ishani Chakraborty, Ahmed Elgammal, Randall S. Burd
2013 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)  
The method is thus adaptable to many activity recognition problems. In this paper, we show our approach using videos of simulated trauma simulations.  ...  Inference on this network determines the most consistent sequence of procedures over time. Our activity model is modular and extendible to a multitude of sensor inputs and detection methods.  ...  In contrast, dynamic activities are based on temporally and spatially extended associations among attributes, e.g., Listen to breath sounds.  ... 
doi:10.1109/fg.2013.6553758 dblp:conf/fgr/ChakrabortyEB13 fatcat:2ug24mlhkbggddsrbago2sqfwi
« Previous Showing results 1 — 15 out of 6,206 results