A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
[article]
2021
arXiv
pre-print
This paper proposes a framework for the interactive video object segmentation (VOS) in the wild where users can choose some frames for annotations iteratively. ...
Thus, we formulate the frame selection problem in the interactive VOS as a Markov Decision Process, where an agent is learned to recommend the frame under a deep reinforcement learning framework. ...
Acknowledgements This work was supported by the Special Funds for the Construction of Innovative Provinces in Hunan (2019NK2022), NSFC (61672222, 61932020), National Key R&D Program of China (2018AAA0100704 ...
arXiv:2103.10391v2
fatcat:g4rzmfwnvna6ngcicl3lhwplse
A Framework For Large-Scale Analysis Of Video "In The Wild" To Assist Digital Forensic Examination
2017
Zenodo
tools for image and video analysis, object detection and tracking and event detection. ...
These tools exploit the latest advances in machine learning, including deep neural networks, to handle the challenges in processing content from real-world data sources. ...
The authors would like to thank the London Metropolitan Police, UK, for providing the CCTV video footage and for giving permission to process the dataset for research purposes. ...
doi:10.5281/zenodo.1071749
fatcat:fdq74cdiurh3biqnsewscbjmdy
A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data
[article]
2021
arXiv
pre-print
Machine learning has been utilized to perform tasks in many different domains such as classification, object detection, image segmentation and natural language analysis. ...
In this survey paper, we provide a review of previous techniques that focuses on optimized data annotation and labeling for video, audio, and text data. ...
In [7] , authors state a recursive and semi-automatic annotation approach which proposes initial annotations for all frames in a video based on segmenting only a few manual objects. ...
arXiv:2109.03784v1
fatcat:uu55zfmtajcvdjekxeaue76izy
Action Recognition Using a Spatial-Temporal Network for Wild Felines
2021
Animals
The temporal part presents a novel skeleton-based action recognition model based on the bending angle fluctuation amplitude of the knee joints in a video clip. ...
Behavior analysis of wild felines has significance for the protection of a grassland ecological environment. ...
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. ...
doi:10.3390/ani11020485
pmid:33673162
fatcat:447boqrki5htjo7zgt7tty72bu
Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos
[article]
2020
arXiv
pre-print
Experiments are conducted on videos diversely captured in the wild by handheld smartphones and dash cameras, and show high fidelity and good generalization of our method in both visual quality and lighting ...
This paper proposes a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles. ...
The processing is "one click and go" and no user interactions are needed. Our method consists of multiple components: -A sky matting network for detecting sky regions in video frames. ...
arXiv:2010.11800v1
fatcat:dh3kpnrqp5bebeoelb4ip7lswa
Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction
[article]
2021
arXiv
pre-print
In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value. ...
Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. ...
Acknowledgements The authors thank Michael S. Brown and Kosta G. Derpanis for the insightful comments they provided on this review. ...
arXiv:2107.05140v2
fatcat:f23pi3i5fzhqxlirv3slgkl3wu
Video Action Understanding: A Tutorial
[article]
2020
arXiv
pre-print
Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. ...
Finding, identifying, and predicting actions are a few of the most salient tasks in video action understanding. ...
The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. ...
arXiv:2010.06647v1
fatcat:hprgdsdtbfezvcbnwwxpr2n2gu
"You Tube and I Find"—Personalizing Multimedia Content Access
2008
Proceedings of the IEEE
Teiken at IBM Research for their collaboration and active participation in the MAGIC project. ...
Acknowledgment The authors would like to thank S. Gates, A. Katriel, G. Kofman, Y. Li, Y. Park, Y. Ravin, and W. ...
Fig. 6 . 6 Browsing results of video segmentation.
Fig. 4 . 4 Viewing a learning object. ...
doi:10.1109/jproc.2008.916378
fatcat:3ilsibo5qjaudovr5euid56z3e
Long Term Object Detection and Tracking in Collaborative Learning Environments
[article]
2021
arXiv
pre-print
My thesis is focused on the development of accurate methods for detecting and tracking objects in long videos. ...
AOLME project provides a collaborative learning environment for middle school students to explore mathematics, computer science, and engineering by processing digital images and videos. ...
Acknowledgments I would first like to thank my advisor Prof. Marios Pattichis for all his patience in advicing me. His guidance and ideas were crucial for my development as a graduate student. ...
arXiv:2106.07556v1
fatcat:hxniywyqi5cqrhg5n4evcjmxam
DAiSEE: Towards User Engagement Recognition in the Wild
[article]
2018
arXiv
pre-print
, and frustration in the wild. ...
We believe that DAiSEE will provide the research community with challenges in feature extraction, context-based inference, and development of suitable machine learning methods for related tasks, thus providing ...
E-learning environments provide one of the best use cases for studying user engagement in settings where a user interacts with a computer screen. ...
arXiv:1609.01885v6
fatcat:dgbz4gsrovcelhabajawysgm4e
Are You Watching Closely? Content-based Retrieval of Hand Gestures
2020
Proceedings of the 2020 International Conference on Multimedia Retrieval
In this paper, we explore the problem of identifying and retrieving gestures in a large-scale video dataset provided by the computer vision community and based on queries recorded in-the-wild. ...
Gestures play an important role in our daily communications. However, recognizing and retrieving gestures in-the-wild is a challenging task which is not explored thoroughly in literature. ...
These methods vary between those labeling the existing objects in the video key frames [26] , or generate an action label for a sequence of frames [4, 9, 26] . ...
doi:10.1145/3372278.3390723
dblp:conf/mir/ParianRSD20
fatcat:2jsdhqhg7jazbjfdpu54zfifq4
A Survey of Content-Aware Video Analysis for Sports
2018
IEEE transactions on circuits and systems for video technology (Print)
Finally, the paper summarizes the future trends and challenges for sports video analysis. ...
Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. ...
The object class layer consists of object frames, which represent the key objects in the video. In each object frame, an object tag and pointers link each key object to the corresponding video clips. ...
doi:10.1109/tcsvt.2017.2655624
fatcat:rwqzu46sgfb7tpkcav4ysmh6ae
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
[article]
2022
arXiv
pre-print
Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. ...
The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. ...
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. ...
arXiv:2203.14712v2
fatcat:ruqwsqjwuramhdso3b7byfszum
Storyline Representation of Egocentric Videos with an Applications to Story-Based Search
2015
2015 IEEE International Conference on Computer Vision (ICCV)
It is an agonizing task, for example, to manually search for the moment when your daughter first met Mickey Mouse from hours-long egocentric videos taken at Disneyland. ...
Although many summarization methods have been successfully proposed to create concise representations of videos, in practice, the value of the subshots to users may change according to their immediate ...
Given a novel egocentric video, we can then apply the trained object detectors to each frame to detect the most salient supporting object in that frame. ...
doi:10.1109/iccv.2015.514
dblp:conf/iccv/XiongKS15
fatcat:ebbhrf7f3zai5efmtrtd7e34fq
A Survey on Temporal Sentence Grounding in Videos
[article]
2021
arXiv
pre-print
Temporal sentence grounding in videos(TSGV), which aims to localize one target segment from an untrimmed video with respect to a given sentence query, has drawn increasing attentions in the research community ...
) to be used in TSGV, and iii) in-depth discusses potential problems of current benchmarking designs and research directions for further investigations. ...
Objects in each video frame and words in the sentence query are considered as the graph nodes. ...
arXiv:2109.08039v2
fatcat:6ja4csssjzflhj426eggaf77tu
« Previous
Showing results 1 — 15 out of 6,779 results