Filters








399 Hits in 8.6 sec

RGB-D-based Human Motion Recognition with Deep Learning: A Survey [article]

Pichao Wang and Wanqing Li and Philip Ogunbona and Jun Wan and Sergio Escalera
2018 arXiv   pre-print
Particularly, we highlighted the methods of encoding spatial-temporal-structural information inherent in video sequence, and discuss potential directions for future research.  ...  Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data.  ...  The spatio-temporal-structural information has been mined by Jain et al. [59] through a combination of the powers of spatio-temporal graphs and RNN for action recognition.  ... 
arXiv:1711.08362v2 fatcat:cugugpqeffcshnwwto4z2aw4ti

Driving Behavior Explanation with Multi-level Fusion [article]

Hédi Ben-Younes and Éloi Zablocki and Patrick Pérez and Matthieu Cord
2020 arXiv   pre-print
We present BEEF, for BEhavior Explanation with Fusion, a deep architecture which explains the behavior of a trajectory prediction model.  ...  The flexibility and efficiency of our approach are validated with extensive experiments on the HDD and BDD-X datasets.  ...  Overview For a given residual block L of the 3DCNN, we extract spatio-temporal activation maps V L t ∈ R t L ×h L ×w L ×d L , which contain localized information about the input frame sequence.  ... 
arXiv:2012.04983v1 fatcat:d47h46kdqbhizc3mgw5jha66fu

High-level event recognition in unconstrained videos

Yu-Gang Jiang, Subhabrata Bhattacharya, Shih-Fu Chang, Mubarak Shah
2012 International Journal of Multimedia Information Retrieval  
While the existing solutions vary, we identify common key modules and provide detailed descriptions along with some insights for each of them, including extraction and representation of low-level features  ...  In this paper, we review current technologies for complex event recognition in unconstrained videos.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.  ... 
doi:10.1007/s13735-012-0024-2 fatcat:mfzttic3svb4tho2xb6aczgp4y

The MediaMill TRECVID 2012 Semantic Video Search Engine

Cees G. M. Snoek, Koen E. A. van de Sande, AmirHossein Habibian, Svetlana Kordumova, Zhenyang Li, Masoud Mazloom, Silvia L. Pintea, Ran Tao, Dennis C. Koelma, Arnold W. M. Smeulders
2012 TREC Video Retrieval Evaluation  
The starting point for the MediaMill detection approach is our top-performing bagof-words system of TRECVID 2008-2011, which uses multiple color SIFT descriptors, averaged and difference coded into codebooks  ...  Our event detection and recounting experiments focus on representations using concept detectors. For instance search we study the influence of spatial verification and color invariance.  ...  Acknowledgments The authors are grateful to NIST and the TRECVID coordinators for the benchmark organization effort.  ... 
dblp:conf/trecvid/SnoekSHKLMP0KS12 fatcat:tkgsy56yiremddcgodbajxppde

A Comprehensive Study of Deep Video Action Recognition [article]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
2020 arXiv   pre-print
But we also encountered new challenges, including modeling long-range temporal information in videos, high computation costs, and incomparable results due to datasets and evaluation protocol variances.  ...  Video action recognition is one of the representative tasks for video understanding.  ...  Acknowledgement We would like to thank Peter Gehler, Linchao Zhu and Thomas Brady for constructive feedback and fruitful discussions.  ... 
arXiv:2012.06567v1 fatcat:plqytbfck5bcndiceshix5unpa

Deep Lip Reading: A Comparison of Models and an Online Application

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
2018 Interspeech 2018  
The goal of this paper is to develop state-of-the-art models for lip reading -visual speech recognition.  ...  As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech, and show that it achieves high performance with low latency.  ...  Funding for this research is provided by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems, the Oxford-Google DeepMind Graduate Scholarship, and by the EPSRC Programme Grant Seebibyte EP/  ... 
doi:10.21437/interspeech.2018-1943 dblp:conf/interspeech/AfourasCZ18a fatcat:m6fonkixkzam5aila4nmkzbwru

Video Super-Resolution With Temporal Group Attention

Takashi Isobe, Songjiang Li, Xu Jia, Shanxin Yuan, Gregory Slabaugh, Chunjing Xu, Ya-Li Li, Shengjin Wang, Qi Tian
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
These groups provide complementary information to recover missing details in the reference frame, which is further integrated with an attention module and a deep intra-group fusion module.  ...  In this work, we propose a novel method that can effectively incorporate temporal information in a hierarchical way.  ...  Red text indicates the best and blue text indicates the second best performance.  ... 
doi:10.1109/cvpr42600.2020.00803 dblp:conf/cvpr/IsobeLJYSXLW020 fatcat:e5u6c2az6zak5aonaapxavxtmu

A Survey of Content-Based Video Retrieval

P. Geetha, Vasumathi Narayanan
2008 Journal of Computer Science  
This work has done in an aim to assist the upcoming researchers in the field of video retrieval, to know about the techniques and methods available for video retrieval.  ...  and relevance feedback.  ...  This video retrieval process has two phases online and the offline phase. During the offline phase, broadcast videos in multiple languages are stored in a video database.  ... 
doi:10.3844/jcssp.2008.474.486 fatcat:ntongqzelvdbtanlz4tuerzreq

AXES at TRECVID 2012: KIS, INS, and MED

Robin Aly, Kevin McGuinness, Shu Chen, Noel E. O'Connor, Ken Chatfield, Omkar M. Parkhi, Relja Arandjelovic, Andrew Zisserman, Basura Fernando, Tinne Tuytelaars, Dan Oneata, Matthijs Douze (+7 others)
2012 TREC Video Retrieval Evaluation  
As in our TRECVid 2011 system, we used nearly identical search systems and user interfaces for both INS and KIS.  ...  This paper describes in detail our KIS, INS, and MED systems and the results and findings of our experiments.  ...  This work was funded by the EU FP7 Project AXES ICT-269980 and the QUAERO project supported by OSEO. Furthermore, we are grateful to the UK EPSRC and ERC grant VisRec no. 228180 for financial support.  ... 
dblp:conf/trecvid/AlyMCOCPAZFTODR12 fatcat:yjoxl5qju5hpjl2ml65gdvrsgq

A Review on Methods and Applications in Multimodal Deep Learning [article]

Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Jabbar Abdul
2022 arXiv   pre-print
This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals.  ...  The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.  ...  In this method, a combination of C3D and DBN architectures is used to model Spatio-temporal information and representation of video audio streams. D. Nguyen et al.  ... 
arXiv:2202.09195v1 fatcat:wwxrmrwmerfabbenleylwmmj7y

Deep Lip Reading: a comparison of models and an online application [article]

Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
2018 arXiv   pre-print
The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition.  ...  As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech, and show that it achieves high performance with low latency.  ...  Funding for this research is provided by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems, the Oxford-Google DeepMind Graduate Scholarship, and by the EPSRC Programme Grant Seebibyte EP/  ... 
arXiv:1806.06053v1 fatcat:3zqae7cbvngehas32yel3kxxom

The AXES submissions at TRECVID 2013

Robin Aly, Relja Arandjelovic, Ken Chatfield, Matthijs Douze, Basura Fernando, Zaïd Harchaoui, Kevin McGuinness, Noel E. O'Connor, Dan Oneata, Omkar M. Parkhi, Danila Potapov, Jérôme Revaud (+7 others)
2013 TREC Video Retrieval Evaluation  
For SIN, MED and MER, we use state-of-the-art low-level descriptors for motion, image and sound as well as high-level features for speech and text.  ...  Given these features we train linear classifiers, and use early and late-fusion to combine the different features.  ...  Furthermore, we are grateful to the UK EPSRC and ERC grant VisRec no. 228180 for financial support.  ... 
dblp:conf/trecvid/AlyACDFHMOOPPRS13 fatcat:hykqk3q56zeyjhvbrmuuhgggje

Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens [article]

Felix Ott and David Rügamer and Lucas Heublein and Tim Hamann and Jens Barth and Bernd Bischl and Christopher Mutschler
2022 arXiv   pre-print
In contrast to offline HWR that only uses spatial information (i.e., images), online HWR (OnHWR) uses richer spatio-temporal information (i.e., trajectory data or inertial data).  ...  We propose a variety of datasets including equations and words for both the writer-dependent and writer-independent tasks.  ...  Primarstufe (DID) of the Saarland University, Machine Learning and Data Analytics Lab of the Friedrich-Alexander University (FAU) and Fraunhofer Institute for Integrated Circuits (IIS) for their help  ... 
arXiv:2202.07036v2 fatcat:no3ca77id5hh5dgb4gy76lvjba

MovieBase

Tat-Seng Chua, Sheng Tang, Remi Trichet, Hung Khoon Tan, Yan Song
2009 Proceedings of the 1st workshop on Web-scale multimedia corpus - WSMC '09  
The corpus is designed for research in event detection and action recognition. It offers over 71 hours of videos with a total of 69,129 shots.  ...  However, in many cases, especially for event detection and action recognition, the research efforts were hampered by the lack of large scale publicly available benchmarks.  ...  First, we should include text features from movie transcripts and captions available online, as well as social tagging available on YouTube.  ... 
doi:10.1145/1631135.1631143 fatcat:5ofkty7jtrhwlilubhcoogphpi

Research Challenges in Ubiquitous Knowledge Discovery [chapter]

Michael May, Bettina Berendt, Antoine Cornuéjols, João Gama, Fosca Giannotti, Andreas Hotho, Donato Malerba, Ernestina Menesalvas, Katharina Morik, Rasmus Pedersen, Lorenza Saitta, Yücel Saygin (+2 others)
2008 Chapman & Hall/CRC Data Mining and Knowledge Discovery Series  
This chapter is based on the discussions in the network, and contributions from project partners are gratefully acknowledged.  ...  Spatio-Temporal Mining Case studies 1 and 2 highlighted the central role of spatio-temporal data mining, especially GPS track data.  ...  Track data and frequency estimates are combined in a data fusion step. Other tasks are related to spatial clustering.  ... 
doi:10.1201/9781420085877.ch7 fatcat:4mhq33nalnabhg2epkw5y3fxqu
« Previous Showing results 1 — 15 out of 399 results