6,779 Hits in 7.7 sec

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild [article]

Zhaoyuan Yin, Jia Zheng, Weixin Luo, Shenhan Qian, Hanling Zhang, Shenghua Gao
2021 arXiv   pre-print
This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively.  ...  Thus, we formulate the frame selection problem in the interactive VOS as a Markov Decision Process, where an agent is learned to recommend the frame under a deep reinforcement learning framework.  ...  Acknowledgements This work was supported by the Special Funds for the Construction of Innovative Provinces in Hunan (2019NK2022), NSFC (61672222, 61932020), National Key R&D Program of China (2018AAA0100704  ... 
arXiv:2103.10391v2 fatcat:g4rzmfwnvna6ngcicl3lhwplse

A Framework For Large-Scale Analysis Of Video "In The Wild" To Assist Digital Forensic Examination

Apostolos Axenopoulos, Volker Eiselein, Antonio Penta, Eugenia Koblents, Ernesto La Mattina, Petros Daras
2017 Zenodo  
tools for image and video analysis, object detection and tracking and event detection.  ...  These tools exploit the latest advances in machine learning, including deep neural networks, to handle the challenges in processing content from real-world data sources.  ...  The authors would like to thank the London Metropolitan Police, UK, for providing the CCTV video footage and for giving permission to process the dataset for research purposes.  ... 
doi:10.5281/zenodo.1071749 fatcat:fdq74cdiurh3biqnsewscbjmdy

A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data [article]

Shikun Zhang, Omid Jafari, Parth Nagarkar
2021 arXiv   pre-print
Machine learning has been utilized to perform tasks in many different domains such as classification, object detection, image segmentation and natural language analysis.  ...  In this survey paper, we provide a review of previous techniques that focuses on optimized data annotation and labeling for video, audio, and text data.  ...  In [7] , authors state a recursive and semi-automatic annotation approach which proposes initial annotations for all frames in a video based on segmenting only a few manual objects.  ... 
arXiv:2109.03784v1 fatcat:uu55zfmtajcvdjekxeaue76izy

Action Recognition Using a Spatial-Temporal Network for Wild Felines

Liqi Feng, Yaqin Zhao, Yichao Sun, Wenxuan Zhao, Jiaxi Tang
2021 Animals  
The temporal part presents a novel skeleton-based action recognition model based on the bending angle fluctuation amplitude of the knee joints in a video clip.  ...  Behavior analysis of wild felines has significance for the protection of a grassland ecological environment.  ...  The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.  ... 
doi:10.3390/ani11020485 pmid:33673162 fatcat:447boqrki5htjo7zgt7tty72bu

Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos [article]

Zhengxia Zou
2020 arXiv   pre-print
Experiments are conducted on videos diversely captured in the wild by handheld smartphones and dash cameras, and show high fidelity and good generalization of our method in both visual quality and lighting  ...  This paper proposes a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles.  ...  The processing is "one click and go" and no user interactions are needed. Our method consists of multiple components: -A sky matting network for detecting sky regions in video frames.  ... 
arXiv:2010.11800v1 fatcat:dh3kpnrqp5bebeoelb4ip7lswa

Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction [article]

He Zhao, Richard P. Wildes
2021 arXiv   pre-print
In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value.  ...  Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations.  ...  Acknowledgements The authors thank Michael S. Brown and Kosta G. Derpanis for the insightful comments they provided on this review.  ... 
arXiv:2107.05140v2 fatcat:f23pi3i5fzhqxlirv3slgkl3wu

Video Action Understanding: A Tutorial [article]

Matthew Hutchinson, Vijay Gadepally
2020 arXiv   pre-print
Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding.  ...  Finding, identifying, and predicting actions are a few of the most salient tasks in video action understanding.  ...  The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.  ... 
arXiv:2010.06647v1 fatcat:hprgdsdtbfezvcbnwwxpr2n2gu

"You Tube and I Find"—Personalizing Multimedia Content Access

S. Venkatesh, B. Adams, Dinh Phung, C. Dorai, R.G. Farrell, L. Agnihotri, N. Dimitrova
2008 Proceedings of the IEEE  
Teiken at IBM Research for their collaboration and active participation in the MAGIC project.  ...  Acknowledgment The authors would like to thank S. Gates, A. Katriel, G. Kofman, Y. Li, Y. Park, Y. Ravin, and W.  ...  Fig. 6 . 6 Browsing results of video segmentation. Fig. 4 . 4 Viewing a learning object.  ... 
doi:10.1109/jproc.2008.916378 fatcat:3ilsibo5qjaudovr5euid56z3e

Long Term Object Detection and Tracking in Collaborative Learning Environments [article]

Sravani Teeparthi
2021 arXiv   pre-print
My thesis is focused on the development of accurate methods for detecting and tracking objects in long videos.  ...  AOLME project provides a collaborative learning environment for middle school students to explore mathematics, computer science, and engineering by processing digital images and videos.  ...  Acknowledgments I would first like to thank my advisor Prof. Marios Pattichis for all his patience in advicing me. His guidance and ideas were crucial for my development as a graduate student.  ... 
arXiv:2106.07556v1 fatcat:hxniywyqi5cqrhg5n4evcjmxam

DAiSEE: Towards User Engagement Recognition in the Wild [article]

Abhay Gupta, Arjun D'Cunha, Kamal Awasthi, Vineeth Balasubramanian
2018 arXiv   pre-print
, and frustration in the wild.  ...  We believe that DAiSEE will provide the research community with challenges in feature extraction, context-based inference, and development of suitable machine learning methods for related tasks, thus providing  ...  E-learning environments provide one of the best use cases for studying user engagement in settings where a user interacts with a computer screen.  ... 
arXiv:1609.01885v6 fatcat:dgbz4gsrovcelhabajawysgm4e

Are You Watching Closely? Content-based Retrieval of Hand Gestures

Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont
2020 Proceedings of the 2020 International Conference on Multimedia Retrieval  
In this paper, we explore the problem of identifying and retrieving gestures in a large-scale video dataset provided by the computer vision community and based on queries recorded in-the-wild.  ...  Gestures play an important role in our daily communications. However, recognizing and retrieving gestures in-the-wild is a challenging task which is not explored thoroughly in literature.  ...  These methods vary between those labeling the existing objects in the video key frames [26] , or generate an action label for a sequence of frames [4, 9, 26] .  ... 
doi:10.1145/3372278.3390723 dblp:conf/mir/ParianRSD20 fatcat:2jsdhqhg7jazbjfdpu54zfifq4

A Survey of Content-Aware Video Analysis for Sports

Huang-Chia Shih
2018 IEEE transactions on circuits and systems for video technology (Print)  
Finally, the paper summarizes the future trends and challenges for sports video analysis.  ...  Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups.  ...  The object class layer consists of object frames, which represent the key objects in the video. In each object frame, an object tag and pointers link each key object to the corresponding video clips.  ... 
doi:10.1109/tcsvt.2017.2655624 fatcat:rwqzu46sgfb7tpkcav4ysmh6ae

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities [article]

Fadime Sener and Dibyadip Chatterjee and Daniel Shelepov and Kun He and Dipika Singhania and Robert Wang and Angela Yao
2022 arXiv   pre-print
Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections.  ...  The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance.  ...  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.  ... 
arXiv:2203.14712v2 fatcat:ruqwsqjwuramhdso3b7byfszum

Storyline Representation of Egocentric Videos with an Applications to Story-Based Search

Bo Xiong, Gunhee Kim, Leonid Sigal
2015 2015 IEEE International Conference on Computer Vision (ICCV)  
It is an agonizing task, for example, to manually search for the moment when your daughter first met Mickey Mouse from hours-long egocentric videos taken at Disneyland.  ...  Although many summarization methods have been successfully proposed to create concise representations of videos, in practice, the value of the subshots to users may change according to their immediate  ...  Given a novel egocentric video, we can then apply the trained object detectors to each frame to detect the most salient supporting object in that frame.  ... 
doi:10.1109/iccv.2015.514 dblp:conf/iccv/XiongKS15 fatcat:ebbhrf7f3zai5efmtrtd7e34fq

A Survey on Temporal Sentence Grounding in Videos [article]

Xiaohan Lan, Yitian Yuan, Xin Wang, Zhi Wang, Wenwu Zhu
2021 arXiv   pre-print
Temporal sentence grounding in videos(TSGV), which aims to localize one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community  ...  ) to be used in TSGV, and iii) in-depth discusses potential problems of current benchmarking designs and research directions for further investigations.  ...  Objects in each video frame and words in the sentence query are considered as the graph nodes.  ... 
arXiv:2109.08039v2 fatcat:6ja4csssjzflhj426eggaf77tu
« Previous Showing results 1 — 15 out of 6,779 results