A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning Semantic-Aware Dynamics for Video Prediction
[article]
2021
arXiv
pre-print
The result is a predictive model that explicitly represents objects and learns their class-specific motion, which we evaluate on video prediction benchmarks. ...
We propose an architecture and training scheme to predict video frames by explicitly modeling dis-occlusions and capturing the evolution of semantically consistent regions in the video. ...
Our video prediction architecture with learned semantic-aware dynamics. ...
arXiv:2104.09762v1
fatcat:rzbewbus4zftpn6cu4e6asfoi4
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
[article]
2022
arXiv
pre-print
Specifically, we first propose a Semantic-Aware Dynamic Ray Sampling module with an additional parsing branch that facilitates audio-driven volume rendering. ...
Animating high-fidelity video portrait with speech audio is crucial for virtual reality and digital entertainment. ...
We then introduce the Semantic-Aware Dynamic Ray Sampling module, which facilitates fine-grained appearance and dynamics modeling for each portrait part with semantic information (Sec. 3.2). ...
arXiv:2201.07786v1
fatcat:77kaocrzqrbjjcy4sqqq3osruy
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
[article]
2022
arXiv
pre-print
Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision ...
The Dynamic Saliency Prediction (DSP) task simulates the human selective attention mechanism to perceive the dynamic scene, which is significant and imperative in many vision tasks. ...
In view of practical applications, this paper aims to investigate the saliency prediction for the dynamic video. ...
arXiv:2109.08371v3
fatcat:4uqa4l25mjaztppresigt6jd6a
Visual-aware Attention Dual-stream Decoder for Video Captioning
[article]
2021
arXiv
pre-print
The attention mechanism in the current video captioning method learns to assign weight to each frame, promoting the decoder dynamically. ...
Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence. ...
The visual-aware attention mechanism is used to select the fused visual feature dynamically. ...
arXiv:2110.08578v1
fatcat:gsa6o75oqrgo3b3c2gxdzgt5ti
Explanation-Guided Fairness Testing through Genetic Algorithm
[article]
2022
arXiv
pre-print
A plethora of research has proposed diverse methods for individual fairness testing. ...
Moreover, ExpGA only requires prediction probabilities of the tested model, resulting in a better generalization capability to various models. ...
CONCLUSION This work proposes ExpGA, an explanation-guided method through the GA for software fairness testing. ...
arXiv:2205.08335v1
fatcat:kwcxbsoif5ct3cq4m4i77rwee4
Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning
[article]
2017
arXiv
pre-print
Creating a high-quality, aesthetically pleasing cinemagraph requires isolating objects in a semantically meaningful way and then selecting good start times and looping periods for those objects to minimize ...
To achieve this, we present a new technique that uses object recognition and semantic segmentation as part of an optimization method to automatically create cinemagraphs from videos that are both visually ...
The best performance of Joint shows learning the user feature in a context aware manner can improve the quality of preference prediction for cinemagraph. ...
arXiv:1708.02970v1
fatcat:4btr42ilk5ekbopyhgczsgcsea
High-Quality Video Generation from Static Structural Annotations
2020
International Journal of Computer Vision
The second image-to-video (I2V) generation task applies the synthesized starting frame and the associated structural annotation map to animate the scene dynamics for the generation of a photorealistic ...
Integrating structural annotations into the flow prediction also improves the structural awareness in the I2V generation process. ...
While for the video generation for a dynamic scene modeling, there are a list of works trained to predict raw pixels in future frames by learning from historical motion patterns (Mathieu et al. 2015; ...
doi:10.1007/s11263-020-01334-x
fatcat:yedge4qmcbd2jpyz6bo3n5fbqe
Cross-Modal Graph with Meta Concepts for Video Captioning
[article]
2021
arXiv
pre-print
Specifically, to cover the useful semantic concepts in video captions, we weakly learn the corresponding visual regions for text descriptions, where the associated visual regions and textual words are ...
We further build meta concept graphs dynamically with the learned cross-modal meta concepts. ...
Concept Prediction Learning semantic concepts from visual input has been validated to be useful in the captioning task [22] - [24] , where they mainly use a multi-label classification to predict the ...
arXiv:2108.06458v2
fatcat:ud6awh36iba67gacokl25uccim
Music Gesture for Visual Sound Separation
[article]
2020
arXiv
pre-print
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio ...
Recent deep learning approaches have achieved impressive performance on visual sound separation tasks. ...
Once the visual semantic feature and keypoints are extracted from the raw video, we adopt a context-aware Graph CNN (CT-GCN) to fuse the semantic context of instruments and human body dynamics. ...
arXiv:2004.09476v1
fatcat:jl3ujfazkfgcncdfqqfaieebl4
Dual-Level Decoupled Transformer for Video Captioning
[article]
2022
arXiv
pre-print
(ii) for sentence generation, we propose Syntax-Aware Decoder to dynamically measure the contribution of visual semantic and syntax-related words. ...
Video captioning aims to understand the spatio-temporal semantic concept of the video and generate descriptive sentences. ...
Li [22] adopts two layers of spatio-temporal dynamic attention for video subtitles. ...
arXiv:2205.03039v1
fatcat:omrzfavtlngotbf27d43nwe4k4
Probabilistic Future Prediction for Video Scene Understanding
[article]
2020
arXiv
pre-print
We present a novel deep learning architecture for probabilistic future prediction from video. ...
Our model learns a representation from RGB video with a spatio-temporal convolutional module. ...
We also thank Przemyslaw Mazur, Nikolay Nikolov and Roberto Cipolla for the many insightful research discussions. ...
arXiv:2003.06409v2
fatcat:mf56dimeh5hgjpijm2yyeibhzu
Future-Supervised Retrieval of Unseen Queries for Live Video
2017
Proceedings of the 2017 ACM on Multimedia Conference - MM '17
We introduce the use of future frame representations as a supervision signal for learning temporally aware semantic representations on unlabeled video data. ...
We investigate retrieval of previously unseen queries for live video content. Drawing from existing whole-video techniques, we focus on adapting image-trained semantic models to the video domain. ...
We enrich per-frame semantics with temporal awareness by using future representations for supervision. ...
doi:10.1145/3123266.3123437
dblp:conf/mm/CappalloS17
fatcat:t3wgpjthpfhnnbq6eeu56flsyu
Position-aware Location Regression Network for Temporal Video Grounding
[article]
2022
arXiv
pre-print
The key to successful grounding for video surveillance is to understand a semantic phrase corresponding to important actors and objects. ...
To understand comprehensive contexts with only one semantic phrase, we propose Position-aware Location Regression Network (PLRN) which exploits position-aware features of a query and a video. ...
Also, a reinforcement learning (RL)-based approach [11, 28] is introduced for temporal video grounding, where the RL agent adjusts the predicted grounding boundary according to the learned policy. ...
arXiv:2204.05499v1
fatcat:wo73va53pnekrox5lf7d4u53ee
Toward Cost-Effective Mobile Video Streaming through Environment-Aware Watching State Prediction
2019
Sensors
Mobile video applications are becoming increasingly prevalent and enriching the way people learn and are entertained. ...
First, the watching state is predicted by machine learning based on user behavior and the physical environment during a given time window. ...
It provides a cost-effective data download strategy through environment-aware watching state prediction and provides a generalized strategy that can be used for many other video delivery technologies, ...
doi:10.3390/s19173654
fatcat:s4oemgoiiraqthtvgmq2gsg7t4
2021 Index IEEE Transactions on Multimedia Vol. 23
2021
IEEE transactions on multimedia
The Author Index contains the primary entry for each item, listed under the first author's name. ...
Yang, H., +, TMM 2021 572-583 Dynamics Dynamic Motion Estimation and Evolution Video Prediction Network. ...
Gu, L., +, TMM 2021 939-954 Dynamic Motion Estimation and Evolution Video Prediction Network. ...
doi:10.1109/tmm.2022.3141947
fatcat:lil2nf3vd5ehbfgtslulu7y3lq
« Previous
Showing results 1 — 15 out of 25,752 results