A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera
[article]
2022
arXiv
pre-print
We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. ...
Experiments on public datasets and our collected dataset demonstrate that NDR outperforms existing monocular dynamic reconstruction methods. ...
We thank the authors of OcclusionFussion for sharing the fusion results of several RGB-D sequences. We also thank the authors of BANMo for their suggestions on experimental parameter settings. ...
arXiv:2206.15258v1
fatcat:7vrri4jy7jdrxc5puyuklg3oee
Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit Latent Features
[article]
2022
arXiv
pre-print
Most existing methods use an autoencoder (AE) to learn to reconstruct normal videos; they then detect anomalies based on their failure to reconstruct the appearance of abnormal scenes. ...
As anomalies occur rarely, most training data consists of unlabeled videos without anomalous events, which makes the task challenging. ...
Acknowledgement This research was supported by Multi-Ministry Collaborative R&D Program (R&D program for complex cognitive technology) through the National Research Foundation of Korea (NRF) funded by ...
arXiv:2010.07524v3
fatcat:vpsebog6dncmtnrjzbkb47bao4
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
[article]
2022
arXiv
pre-print
In this paper, we present a method that builds on 3D-VQGAN and transformers to generate videos with thousands of frames. ...
We also showcase conditional extensions of our approach for generating meaningful long videos by incorporating temporal information with text and audio. ...
or RGB frames. [5] uses a low-resolution long video generator and short-video super-resolution network to generate videos of dynamic scenes. ...
arXiv:2204.03638v2
fatcat:edgk3fwe6ffq3frlc4qclnhjra
Learning monocular 3D reconstruction of articulated categories from motion
[article]
2021
arXiv
pre-print
Monocular 3D reconstruction of articulated object categories is challenging due to the lack of training data and the inherent ill-posedness of the problem. ...
In this work we use video self-supervision, forcing the consistency of consecutive 3D reconstructions by a motion-based cycle loss. ...
We anticipate further improvements in the future by combining diverse images from static and strong, motion-based supervision from dynamic datasets. ...
arXiv:2103.16352v2
fatcat:nebqw2kclva6rd4vctwnevcmre
Wireless H.264 Video Quality Enhancement Through Optimal Prioritized Packet Fragmentation
2012
IEEE transactions on multimedia
The slices of a priority class in each frame are aggregated into video packets of corresponding priority. ...
We derive the optimal fragment size for each priority class which achieves the maximum expected weighted goodput at different encoded video bit rates, slice sizes and bit error rates. ...
An important problem which affects video quality is error propagation when an error in a reference frame propagates in the decoder to future reconstructed frames which are predicted from that reference ...
doi:10.1109/tmm.2012.2196508
fatcat:wbrikyyajzacnc5bniacpybaqy
Efficient Articulated Trajectory Reconstruction Using Dynamic Programming and Filters
[chapter]
2012
Lecture Notes in Computer Science
This paper considers the problem of reconstructing the motion of a 3D articulated tree from 2D point correspondences subject to some temporal prior. ...
Inspired by recent work which reconstructs general trajectories using compact high-pass filters, we develop a dynamic programming approach which scales linearly in the number of frames, leveraging the ...
Simon Lucey is the recipient of an Australian Research Council Future Fellowship (project FT0991969). ...
doi:10.1007/978-3-642-33718-5_6
fatcat:asgrdz6z75ashnww7efe77vzt4
Disentangled Sequential Autoencoder
[article]
2018
arXiv
pre-print
Our deep generative model learns a latent representation of the data which is split into a static and dynamic part, allowing us to approximately disentangle latent time-dependent features (dynamics) from ...
This architecture gives us partial control over generating content and dynamics by conditioning on either one of these sets of features. ...
In the example of video sequence modelling, an ideal disentangled representation would be able to separate time-independent concepts (e.g. the identity of the object in the scene) from dynamical information ...
arXiv:1803.02991v2
fatcat:i7572ao3d5aebcynqw3uaujapi
4D Generic Video Object Proposals
[article]
2020
arXiv
pre-print
We propose an approach that can reliably extract spatio-temporal object proposals for both known and unknown object categories from stereo video. ...
Many high-level video understanding methods require input in the form of object proposals. ...
We achieve that by training the network in the category-agnostic setting, i.e. by merging all 80 COCO classes into one "object" class. ...
arXiv:1901.09260v3
fatcat:itjnypqps5g5fo7iep54ifujni
MAST: A Memory-Augmented Self-supervised Tracker
[article]
2020
arXiv
pre-print
Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. ...
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. ...
We conjecture that our model gains from higher quality videos and larger object classes in these datasets. Image feature alignment. ...
arXiv:2002.07793v2
fatcat:hn6fof2ganfuldzbxkuvouckoq
Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
[article]
2019
arXiv
pre-print
Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. ...
We propose a deep video prediction model conditioned on a single image and an action class. ...
Since our keypoints detector works in a body orientation agnostic way, object moves in the opposite direction from our expectations in some cases. ...
arXiv:1910.02027v1
fatcat:yku2je7ik5hr3hcbkmb2dq6o2y
MAST: A Memory-Augmented Self-Supervised Tracker
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Third, we benchmark on large-scale semi-supervised video object segmentation (aka. dense tracking), and propose a new metric: generalizability. ...
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. ...
Even learning from only a subset of all classes, our model generalizes well to unseen classes, with a generalization gap (i.e. the performance difference between seen and unseen objects) near zero (0.4 ...
doi:10.1109/cvpr42600.2020.00651
dblp:conf/cvpr/LaiLX20
fatcat:n33ady2blvhtjl3hrth7wivbbe
Deformable Capsules for Object Detection
[article]
2022
arXiv
pre-print
need for modeling a large number of objects and classes, which have never been achieved with capsule networks before. ...
, generalizing to unusual poses/viewpoints of objects. ...
to further investigation; (3) the choice of dimensions to model the class-agnostic instantiation parameters of objects was chosen semi-arbitrarily and could likely improve from fine-search; and (4) the ...
arXiv:2104.05031v2
fatcat:mugmaoq4hfaa5edgyoj7qdb4s4
Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation
[article]
2018
arXiv
pre-print
We would like robots to visually perceive scenes and learn an understanding of the objects in them that (i) is task-agnostic and can be used as a building block for a variety of manipulation tasks, (ii ...
) is generally applicable to both rigid and non-rigid objects, (iii) takes advantage of the strong priors provided by 3D vision, and (iv) is entirely learned from self-supervision. ...
In contrast we sought to use only static reconstruction but seek consistency for dynamic objects. ...
arXiv:1806.08756v2
fatcat:fa7265v2vvbfznijifddccbc5i
Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition
[article]
2021
arXiv
pre-print
Object pose increases intraclass object variance which makes object recognition from 2D images harder. ...
Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously. ...
To further explore the class-agnostic property of OPT-Net, we design experiments that generalize OPT-Net's ability for object pose transformation from one dataset to other datasets. 15 categories of objects ...
arXiv:2003.08526v4
fatcat:e755jl62wvgndaywcnq2bgxkra
VA-RED^2: Video Adaptive Redundancy Reduction
[article]
2021
arXiv
pre-print
The type of redundant features depends on the dynamics and type of events in the video: static videos have more temporal redundancy while videos focusing on objects tend to have more channel redundancy ...
To keep the capacity of the original model, after fully computing the necessary features, we reconstruct the remaining redundant features from those using cheap linear operations. ...
., 2019) is a recent collection of one million labeled videos, involving actions from people, animals, objects or natural phenomena. ...
arXiv:2102.07887v2
fatcat:2t3xepgfsba5tiycygwpu42kki
« Previous
Showing results 1 — 15 out of 2,912 results