Filters








2,912 Hits in 4.8 sec

Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera [article]

Hongrui Cai, Wanquan Feng, Xuetao Feng, Yan Wang, Juyong Zhang
2022 arXiv   pre-print
We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera.  ...  Experiments on public datasets and our collected dataset demonstrate that NDR outperforms existing monocular dynamic reconstruction methods.  ...  We thank the authors of OcclusionFussion for sharing the fusion results of several RGB-D sequences. We also thank the authors of BANMo for their suggestions on experimental parameter settings.  ... 
arXiv:2206.15258v1 fatcat:7vrri4jy7jdrxc5puyuklg3oee

Unsupervised Video Anomaly Detection via Normalizing Flows with Implicit Latent Features [article]

MyeongAh Cho, Taeoh Kim, Woo Jin Kim, Suhwan Cho, Sangyoun Lee
2022 arXiv   pre-print
Most existing methods use an autoencoder (AE) to learn to reconstruct normal videos; they then detect anomalies based on their failure to reconstruct the appearance of abnormal scenes.  ...  As anomalies occur rarely, most training data consists of unlabeled videos without anomalous events, which makes the task challenging.  ...  Acknowledgement This research was supported by Multi-Ministry Collaborative R&D Program (R&D program for complex cognitive technology) through the National Research Foundation of Korea (NRF) funded by  ... 
arXiv:2010.07524v3 fatcat:vpsebog6dncmtnrjzbkb47bao4

Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer [article]

Songwei Ge, Thomas Hayes, Harry Yang, Xi Yin, Guan Pang, David Jacobs, Jia-Bin Huang, Devi Parikh
2022 arXiv   pre-print
In this paper, we present a method that builds on 3D-VQGAN and transformers to generate videos with thousands of frames.  ...  We also showcase conditional extensions of our approach for generating meaningful long videos by incorporating temporal information with text and audio.  ...  or RGB frames. [5] uses a low-resolution long video generator and short-video super-resolution network to generate videos of dynamic scenes.  ... 
arXiv:2204.03638v2 fatcat:edgk3fwe6ffq3frlc4qclnhjra

Learning monocular 3D reconstruction of articulated categories from motion [article]

Filippos Kokkinos, Iasonas Kokkinos
2021 arXiv   pre-print
Monocular 3D reconstruction of articulated object categories is challenging due to the lack of training data and the inherent ill-posedness of the problem.  ...  In this work we use video self-supervision, forcing the consistency of consecutive 3D reconstructions by a motion-based cycle loss.  ...  We anticipate further improvements in the future by combining diverse images from static and strong, motion-based supervision from dynamic datasets.  ... 
arXiv:2103.16352v2 fatcat:nebqw2kclva6rd4vctwnevcmre

Wireless H.264 Video Quality Enhancement Through Optimal Prioritized Packet Fragmentation

K. K. R. Kambhatla, S. Kumar, S. Paluri, P. C. Cosman
2012 IEEE transactions on multimedia  
The slices of a priority class in each frame are aggregated into video packets of corresponding priority.  ...  We derive the optimal fragment size for each priority class which achieves the maximum expected weighted goodput at different encoded video bit rates, slice sizes and bit error rates.  ...  An important problem which affects video quality is error propagation when an error in a reference frame propagates in the decoder to future reconstructed frames which are predicted from that reference  ... 
doi:10.1109/tmm.2012.2196508 fatcat:wbrikyyajzacnc5bniacpybaqy

Efficient Articulated Trajectory Reconstruction Using Dynamic Programming and Filters [chapter]

Jack Valmadre, Yingying Zhu, Sridha Sridharan, Simon Lucey
2012 Lecture Notes in Computer Science  
This paper considers the problem of reconstructing the motion of a 3D articulated tree from 2D point correspondences subject to some temporal prior.  ...  Inspired by recent work which reconstructs general trajectories using compact high-pass filters, we develop a dynamic programming approach which scales linearly in the number of frames, leveraging the  ...  Simon Lucey is the recipient of an Australian Research Council Future Fellowship (project FT0991969).  ... 
doi:10.1007/978-3-642-33718-5_6 fatcat:asgrdz6z75ashnww7efe77vzt4

Disentangled Sequential Autoencoder [article]

Yingzhen Li, Stephan Mandt
2018 arXiv   pre-print
Our deep generative model learns a latent representation of the data which is split into a static and dynamic part, allowing us to approximately disentangle latent time-dependent features (dynamics) from  ...  This architecture gives us partial control over generating content and dynamics by conditioning on either one of these sets of features.  ...  In the example of video sequence modelling, an ideal disentangled representation would be able to separate time-independent concepts (e.g. the identity of the object in the scene) from dynamical information  ... 
arXiv:1803.02991v2 fatcat:i7572ao3d5aebcynqw3uaujapi

4D Generic Video Object Proposals [article]

Aljosa Osep, Paul Voigtlaender, Mark Weber, Jonathon Luiten, Bastian Leibe
2020 arXiv   pre-print
We propose an approach that can reliably extract spatio-temporal object proposals for both known and unknown object categories from stereo video.  ...  Many high-level video understanding methods require input in the form of object proposals.  ...  We achieve that by training the network in the category-agnostic setting, i.e. by merging all 80 COCO classes into one "object" class.  ... 
arXiv:1901.09260v3 fatcat:itjnypqps5g5fo7iep54ifujni

MAST: A Memory-Augmented Self-supervised Tracker [article]

Zihang Lai, Erika Lu, Weidi Xie
2020 arXiv   pre-print
Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability.  ...  Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.  ...  We conjecture that our model gains from higher quality videos and larger object classes in these datasets. Image feature alignment.  ... 
arXiv:2002.07793v2 fatcat:hn6fof2ganfuldzbxkuvouckoq

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction [article]

Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim
2019 arXiv   pre-print
Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects.  ...  We propose a deep video prediction model conditioned on a single image and an action class.  ...  Since our keypoints detector works in a body orientation agnostic way, object moves in the opposite direction from our expectations in some cases.  ... 
arXiv:1910.02027v1 fatcat:yku2je7ik5hr3hcbkmb2dq6o2y

MAST: A Memory-Augmented Self-Supervised Tracker

Zihang Lai, Erika Lu, Weidi Xie
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Third, we benchmark on large-scale semi-supervised video object segmentation (aka. dense tracking), and propose a new metric: generalizability.  ...  Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.  ...  Even learning from only a subset of all classes, our model generalizes well to unseen classes, with a generalization gap (i.e. the performance difference between seen and unseen objects) near zero (0.4  ... 
doi:10.1109/cvpr42600.2020.00651 dblp:conf/cvpr/LaiLX20 fatcat:n33ady2blvhtjl3hrth7wivbbe

Deformable Capsules for Object Detection [article]

Rodney Lalonde, Naji Khosravan, Ulas Bagci
2022 arXiv   pre-print
need for modeling a large number of objects and classes, which have never been achieved with capsule networks before.  ...  , generalizing to unusual poses/viewpoints of objects.  ...  to further investigation; (3) the choice of dimensions to model the class-agnostic instantiation parameters of objects was chosen semi-arbitrarily and could likely improve from fine-search; and (4) the  ... 
arXiv:2104.05031v2 fatcat:mugmaoq4hfaa5edgyoj7qdb4s4

Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation [article]

Peter R. Florence, Lucas Manuelli, Russ Tedrake
2018 arXiv   pre-print
We would like robots to visually perceive scenes and learn an understanding of the objects in them that (i) is task-agnostic and can be used as a building block for a variety of manipulation tasks, (ii  ...  ) is generally applicable to both rigid and non-rigid objects, (iii) takes advantage of the strong priors provided by 3D vision, and (iv) is entirely learned from self-supervision.  ...  In contrast we sought to use only static reconstruction but seek consistency for dynamic objects.  ... 
arXiv:1806.08756v2 fatcat:fa7265v2vvbfznijifddccbc5i

Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition [article]

Yunhao Ge, Jiaping Zhao, Laurent Itti
2021 arXiv   pre-print
Object pose increases intraclass object variance which makes object recognition from 2D images harder.  ...  Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously.  ...  To further explore the class-agnostic property of OPT-Net, we design experiments that generalize OPT-Net's ability for object pose transformation from one dataset to other datasets. 15 categories of objects  ... 
arXiv:2003.08526v4 fatcat:e755jl62wvgndaywcnq2bgxkra

VA-RED^2: Video Adaptive Redundancy Reduction [article]

Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris
2021 arXiv   pre-print
The type of redundant features depends on the dynamics and type of events in the video: static videos have more temporal redundancy while videos focusing on objects tend to have more channel redundancy  ...  To keep the capacity of the original model, after fully computing the necessary features, we reconstruct the remaining redundant features from those using cheap linear operations.  ...  ., 2019) is a recent collection of one million labeled videos, involving actions from people, animals, objects or natural phenomena.  ... 
arXiv:2102.07887v2 fatcat:2t3xepgfsba5tiycygwpu42kki
« Previous Showing results 1 — 15 out of 2,912 results