149 Hits in 7.3 sec

A C3D-Based Convolutional Neural Network for Frame Dropping Detection in a Single Video Shot

Chengjiang Long, Eric Smith, Arslan Basharat, Anthony Hoogs
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
In this paper, we propose to adapt the Convolutional 3D Neural Network (C3D) for frame drop detection.  ...  Automatically detecting dropped frames across a large archive of videos while maintaining a low false alarm rate is a challenging task in digital video forensics.  ...  In this paper we consider only videos with a single shot to avoid the confusion between frame drops and shot breaks.  ... 
doi:10.1109/cvprw.2017.237 dblp:conf/cvpr/LongSBH17 fatcat:glik6bdlvjesrnrlgmrhroldbi

A Coarse-to-fine Deep Convolutional Neural Network Framework for Frame Duplication Detection and Localization in Forged Videos [article]

Chengjiang Long, Arslan Basharat, Anthony Hoogs
2019 arXiv   pre-print
In this paper, we propose a novel coarse-to-fine framework based on deep Convolutional Neural Networks to automatically detect and localize such frame duplication.  ...  Videos can be manipulated by duplicating a sequence of consecutive frames with the goal of concealing or imitating a specific content in the same video.  ...  For this purpose, we build upon the work by Long et al. [17] who proposed a C3D-based network for frame-drop detection and only works for single shot videos.  ... 
arXiv:1811.10762v2 fatcat:hkbijwcgwbchnil7ws55rmxxdy

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks [article]

Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang
2018 arXiv   pre-print
In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network  ...  Unlike many state-of-the-art systems that require a separate proposal and classification stage, our S3D is intrinsically simple and dedicatedly designed for single-shot, end-to-end temporal activity detection  ...  Introduction Advances in deep Convolutional Neural Network (CNN) have led to significant progress in video analysis over the past few years.  ... 
arXiv:1807.08069v2 fatcat:i3lkhqkwmfb43f7rhi42elve4m

Single Shot Temporal Action Detection

Tianwei Lin, Xu Zhao, Zheng Shou
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
To address this issue, we propose a novel Single Shot Action Detector (SSAD) network based on 1D temporal convolutional layers to skip the proposal generation step via directly detecting action instances  ...  in untrimmed video.  ...  CONCLUSION In this paper, we propose the Single Shot Action Detector (SSAD) network for temporal action detection task.  ... 
doi:10.1145/3123266.3123343 dblp:conf/mm/LinZS17 fatcat:2zrkg326ebbnncr5icyl6k4toa

Impoved RPN for Single Targets Detection based on the Anchor Mask Net [article]

Mingjie Li, Youqian Feng, Zhonghai Yin, Cheng Zhou, Fanghao Dong
2019 arXiv   pre-print
Common target detection is usually based on single frame images, which is vulnerable to affected by the similar targets in the image and not applicable to video images.  ...  In this paper , anchor mask is proposed to add the prior knowledge for target detection and an anchor mask net is designed to impove the RPN performance for single target detection.  ...  neural Networks(3dCNN) is firstly use for video analysis proposed in the C3D[13].  ... 
arXiv:1906.07527v1 fatcat:wzuh7ff5orberkpr2ssfvg2ane

Learning Person Trajectory Representations for Team Activity Analysis [article]

Nazanin Mehrasa, Yatao Zhong, Frederick Tung, Luke Bornn, Greg Mori
2017 arXiv   pre-print
Activity analysis in which multiple people interact across a large space is challenging due to the interplay of individual actions and collective group dynamics.  ...  We develop our deep learning approach in the context of team sports, which provide well-defined sets of events (e.g. pass, shot) and groups of people (teams).  ...  For example, by adding two fully connected layers after the 5conv model in Table 3 , we obtain only a slight increase in possession-based accuracy and a drop in game-based accuracy.  ... 
arXiv:1706.00893v1 fatcat:4hzrwhix7faivkivssdr35s2qe

Learning Pixel-Level Distinctions for Video Highlight Detection [article]

Fanyue Wei, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan
2022 arXiv   pre-print
We design an encoder-decoder network to estimate the pixel-level distinction, in which we leverage the 3D convolutional neural networks to exploit the temporal context information, and further take advantage  ...  First, it allows us to exploit the temporal and spatial relations of the content in one video, since the distinction of a pixel in one frame is highly dependent on both the content before this frame and  ...  Acknowledgements This work is supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (Grant No. 62176047), Beijing Natural  ... 
arXiv:2204.04615v1 fatcat:h35saplykjgn7cvts6nodm4lvy

Late Fusion of Bayesian and Convolutional Models for Action Recognition

Camille Maurice, Francisco Madrigal, Frederic Lerasle
2021 2020 25th International Conference on Pattern Recognition (ICPR)  
order. 31 Data-driven approaches based on convolutional neural net-32 works (CNN) adapted to the video domain with 3D convo-33 lutions allow the recognition of actions in video streams. 3D 34 convolutional  ...  In this paper, we propose a hybrid approach resulting from 4 the fusion of a deep learning neural network with a Bayesian-5 based approach.  ...  We invite the reader 237 to consult [9] the paper for more in-depth details. 238 B. 3D convolution network: C3D 239 C3D [3] is a deep learning network that takes into account, 240 in addition to images  ... 
doi:10.1109/icpr48806.2021.9412510 fatcat:rc47n7c5tzb5pi7t2tgmonzxu4

Review of dynamic gesture recognition

Yuanyuan SHI, Yunan LI, Xiaolong FU, M.I.A.O. Kaibin, M.I.A.O. Qiguang
2021 Virtual Reality & Intelligent Hardware  
The reviewed methods are broadly categorized into three groups based on the type of neural networks used for recognition: twostream convolutional neural networks, 3D convolutional neural networks, and  ...  Keywords Video-based gesture recognition; Deep learning; Convolutional neural networks; Humancomputer interaction Introduction Human-computer interaction (HCI) technology [1] is used for the natural interaction  ...  Method based on 3D convolutional neural networks initially completed a dynamic gesture recognition algorithm directly based on ResC3D [59] .  ... 
doi:10.1016/j.vrih.2021.05.001 fatcat:jpddnlf2xbfufnyuf3s6fbxgty

ActivityNet Challenge 2017 Summary [article]

Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Ranjay Khrisna, Victor Escorcia, Kenji Hata, Shyamal Buch
2017 arXiv   pre-print
We would like to thank the authors of the Kinetics dataset for their kind support; and Joao Carreira and Brian Zhang for helpful discussions.  ...  Approach We propose a Region Convolutional 3D Network (R-C3D), a novel convolutional neural network for activity detection in continuous video streams.  ...  We propose a fast end-to-end Region Convolutional 3D Network (R-C3D) for activity detection in continuous video streams.  ... 
arXiv:1710.08011v1 fatcat:bc5qhp2cungrdj4j3lebxeoane

Multimodal Gesture Recognition Based on the ResC3D Network

Qiguang Miao, Yunan Li, Wanli Ouyang, Zhenxin Ma, Xin Xu, Weikang Shi, Xiaochun Cao
2017 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)  
In this paper, we propose a multimodal gesture recognition method based on a ResC3D network. One key idea is to find a compact and effective representation of video sequences.  ...  Upon these representations, a ResC3D network, which leverages the advantages of both residual and C3D model, is developed to extract features, together with a canonical correlation analysis based fusion  ...  [23] combine C3D and RNN to form a recurrent 3D CNN for video-based detection and classification issues. Donahue et al.  ... 
doi:10.1109/iccvw.2017.360 dblp:conf/iccvw/MiaoLOMXSC17 fatcat:m4wjwwtpmzgfxi6e5d45jxfpjq

AENet: Learning Deep Audio Features for Video Analysis [article]

Naoya Takahashi, Michael Gygli, Luc Van Gool
2017 arXiv   pre-print
In order to incorporate this long-time frequency structure of audio events, we introduce a convolutional neural network (CNN) operating on a large temporal input.  ...  The combination of our network architecture and a novel data augmentation outperforms previous methods for audio event detection by 16%.  ...  Karpathy et al.introduced a largescale dataset for sports classification in videos [26] . They investigated ways to improve single frame CNNs by fusing spatial features over multiple frames in time.  ... 
arXiv:1701.00599v2 fatcat:xuiseg6lhjecrhbcix4p6hv33e

Temporal Action Detection in Untrimmed Videos from Fine to Coarse Granularity

Guangle Yao, Tao Lei, Xianyuan Liu, Ping Jiang
2018 Applied Sciences  
Recent years, artificial neural networks, such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) improve the performance significantly in various computer vision tasks, including  ...  Temporal action detection in long, untrimmed videos is an important yet challenging task that requires not only recognizing the categories of actions in videos, but also localizing the start and end times  ...  [2] used the C3D [21] , a type of 3D Convolutional Neural Network architecture, to extract features of video segments and trained LSTM to classify the segments.  ... 
doi:10.3390/app8101924 fatcat:fnkew4ohkbbxdarbynmpgn6b6u

Initialization Strategies of Spatio-Temporal Convolutional Neural Networks [article]

Elman Mansimov, Nitish Srivastava, Ruslan Salakhutdinov
2015 arXiv   pre-print
We propose a new way of incorporating temporal information present in videos into Spatial Convolutional Neural Networks (ConvNets) trained on images, that avoids training Spatio-Temporal ConvNets from  ...  We show that it is important to initialize 3D Convolutional Weights judiciously in order to learn temporal representations of videos.  ...  Table 2 : 2 Comparisons with other state-ofthe-art neural networks based approaches on RGB data.  ... 
arXiv:1503.07274v1 fatcat:wspquikxfnbcze6dbyegoazgke

Few-shot Action Recognition with Permutation-invariant Attention [article]

Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz
2020 arXiv   pre-print
We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns.  ...  Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos.  ...  This research is supported in part by the Australian Research Council through Australian Centre for Robotic Vision (CE140100016), Australian Research Council grants (DE140100180), the China Scholarship  ... 
arXiv:2001.03905v3 fatcat:bfj2xhgavvgete5m347gja6ney
« Previous Showing results 1 — 15 out of 149 results