A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Weakly-supervised Action Localization via Hierarchical Mining
[article]
2022
arXiv
pre-print
In this work, we propose a hierarchical mining strategy under video-level and snippet-level manners, i.e., hierarchical supervision and hierarchical consistency mining, to maximize the usage of the given ...
Thus, the crucial issue of existing weakly-supervised action localization methods is the limited supervision from the weak annotations for precise predictions. ...
In contrast, we mine the hierarchical supervision and consistency for better action localization without bells and whistles. ...
arXiv:2206.11011v1
fatcat:2qbjft2y3bfzjlflq223mvpqwy
Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning
[article]
2020
arXiv
pre-print
To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. ...
With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome ...
[21] used long short-term memory (LSTM) to predict the importance score with a Determinantal Point Process (DPP) module. ...
arXiv:2001.05864v2
fatcat:mdkkrc75zjex3jnvk7blnqemqe
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
[article]
2021
arXiv
pre-print
In this work, we revisit hierarchical models in video prediction. ...
Learning to predict the long-term future of video frames is notoriously challenging due to inherent ambiguities in the distant future and dramatic amplifications of prediction error through time. ...
Hierarchical long-term video
prediction without supervision. In ICML. 2018. ...
arXiv:2104.06697v1
fatcat:thkaq2a53fhyzedof52lzw7xtm
Unsupervised Hierarchical Concept Learning
[article]
2020
arXiv
pre-print
However, recent work to discover such concepts without access to any environment does not discover relationships (or a hierarchy) between these discovered concepts. ...
Organizing these discovered concepts hierarchically at different levels of abstraction is useful in discovering patterns, building ontologies, and generating tutorials from demonstration data. ...
We extend the architecture in Shankar et al. (2019) to simultaneously discover concepts along with their hierarchical organization without any supervision. ...
arXiv:2010.02556v1
fatcat:4nhaaajxh5efpngtlrzrvoy2ru
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
[article]
2021
arXiv
pre-print
Further, we present a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences. ...
This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos, where only the ordered sequence of video-level actions is available during training. ...
Without loss of generality, we used the output of our two-stream hierarchical networkf c , so that p(τ ) = 1 for the predicted task τ = argmax(f c ) and p(τ ) = 0 otherwise. ...
arXiv:2110.05697v1
fatcat:nrxass56fvapfovylsyeatd72e
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization
[article]
2021
arXiv
pre-print
To address this limitation, we propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization, which takes into consideration the semantic dependencies across videos. ...
Extensive experiments are conducted to verify the effectiveness of the proposed modules and the superiority of VJMHT in terms of F-measure and rank-based evaluation. ...
dealing with long videos. ...
arXiv:2112.13478v1
fatcat:evdrt2a2mff4dgfy27eiyrhcci
Hierarchical Self-supervised Representation Learning for Movie Understanding
[article]
2022
arXiv
pre-print
In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level ...
Most self-supervised video representation learning approaches focus on action recognition. ...
In [54] , the authors propose a long-term temporal model called Object Transformer (OT). ...
arXiv:2204.03101v1
fatcat:kl2xwoczfzedvd5tx452ecg2le
ReMOTS: Self-Supervised Refining Multi-Object Tracking and Segmentation
[article]
2021
arXiv
pre-print
to form short-term tracklets. (3) Training the appearance encoder using short-term tracklets as reliable pseudo labels. (4) Merging short-term tracklets to long-term tracklets utilizing adopted appearance ...
To tackle this issue, we propose a self-supervised refining MOTS (i.e., ReMOTS) framework. ...
Merging Short-term Tracklets With better appearance features and more robust spatiotemporal information of short-term tracklets, we are able to merge short-term tracklets into long-term ones. ...
arXiv:2007.03200v3
fatcat:ijgzbckgjzbn5a47hk3f4djzeu
A Perceptual Prediction Framework for Self Supervised Event Segmentation
[article]
2019
arXiv
pre-print
We introduce a self-supervised, predictive learning framework that draws inspiration from cognitive psychology to segment long, visually complex videos into individual, stable segments that share the same ...
In this paper, we tackle the problem of self-supervised temporal segmentation of long videos that alleviate the need for any supervision. ...
Related Work Fully supervised approaches treat event segmentation as a supervised learning problem and assign the semantics to the video in terms of labels and try to segment the video into its semantically ...
arXiv:1811.04869v3
fatcat:suoguja4dzbvrjsg3m5lkcdtnq
Action-Agnostic Human Pose Forecasting
[article]
2018
arXiv
pre-print
For instance, previous work either focused only on short-term or long-term predictions, while sacrificing one or the other. ...
In this paper, we propose a new action-agnostic method for short- and long-term human pose forecasting. ...
predict future sequences without the demand for the supervising signal from action labels. ...
arXiv:1810.09676v1
fatcat:pms3wo6iyvbsrh2vkcdqjfdgza
A Perceptual Prediction Framework for Self Supervised Event Segmentation
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
We introduce a self-supervised, predictive learning framework that draws inspiration from cognitive psychology to segment long, visually complex videos into constituent events. ...
Temporal segmentation of long videos is an important problem, that has largely been tackled through supervised learning, often requiring large amounts of annotated training data. ...
Related Work Fully supervised approaches treat event segmentation as a supervised learning problem and assign the semantics to the video in terms of labels and try to segment the video into its semantically ...
doi:10.1109/cvpr.2019.00129
dblp:conf/cvpr/AakurS19
fatcat:z7p22thdqnf5xffzrhxvxvbc3q
Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning
[article]
2022
arXiv
pre-print
(Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. ...
Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder ...
Video Transformer (V-TRS). The V-TRS model summarizes the long-term abstracted video level information. ...
arXiv:2207.09644v2
fatcat:ffyjgbxzevf3pkjqp25ulrs3my
A Neurally-Inspired Hierarchical Prediction Network for Spatiotemporal Sequence Learning and Prediction
[article]
2021
arXiv
pre-print
This facilitates the learning of relationships among movement patterns, yielding state-of-the-art performance in long range video sequence predictions in the benchmark datasets. ...
in the visual cortical hierarchy for predicting future video frames. ...
This model learns a LSTM (long short-term memory) model at each level to predict the errors made in an earlier level of the hierarchical visual system. ...
arXiv:1901.09002v2
fatcat:3yz3fenx2zaoje4gpi5ii7x4yi
Reconstructive Sequence-Graph Network for Video Summarization
2021
IEEE Transactions on Pattern Analysis and Machine Intelligence
Long Short-Term Memory (LSTM), and the shot-level dependencies are captured by the Graph Convolutional Network (GCN). ...
Motivated by this point, we propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically, where the frame-level dependencies are encoded by ...
This point has also been proved by the results of our unsupervised variant RSGN uns , since it performs even better than those supervised baselines without the video reconstructor. ...
doi:10.1109/tpami.2021.3072117
pmid:33835915
fatcat:x5ql4fld5barnndnv2uizejeau
Associating Objects with Transformers for Video Object Segmentation
[article]
2021
arXiv
pre-print
For sufficiently modeling multi-object association, a Long Short-Term Transformer is designed for constructing hierarchical matching and propagation. ...
This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. ...
Related Work Semi-supervised Video Object Segmentation. ...
arXiv:2106.02638v3
fatcat:mwhmxpp2u5dmplxyquvkin4y7i
« Previous
Showing results 1 — 15 out of 23,693 results