10,576 Hits in 8.1 sec

Leveraging Motion Priors in Videos for Improving Human Segmentation [article]

Yu-Ting Chen, Wen-Yen Chang, Hai-Lun Lu, Tingfan Wu, Min Sun
2018 arXiv   pre-print
In this work, we propose to leverage "motion prior" in videos for improving human segmentation in a weakly-supervised active learning setting.  ...  By extracting motion information using optical flow in videos, we can extract candidate foreground motion segments (referred to as motion prior) potentially corresponding to human segments.  ...  In the rest of the paper, we refer to these motion information in a video as "motion prior". In this work, we propose to leverage motion prior in videos for improving human segmentation accuracy.  ... 
arXiv:1807.11436v1 fatcat:jdiwvoqas5hkhohburdpw2wegu

Estimating Egocentric 3D Human Pose in Global Space [article]

Jian Wang and Lingjie Liu and Weipeng Xu and Kripasindhu Sarkar and Christian Theobalt
2021 arXiv   pre-print
traditional outside-in motion capture with external cameras.  ...  Egocentric 3D human pose estimation using a single fisheye camera has become popular recently as it allows capturing a wide range of daily activities in unconstrained environments, which is difficult for  ...  Our method takes an egocentric video as input and processes it in segments.  ... 
arXiv:2104.13454v3 fatcat:6kaczumgd5hwpnfseh3jyi4b5a

End-to-End Joint Semantic Segmentation of Actors and Actions in Video [chapter]

Jingwei Ji, Shyamal Buch, Alvaro Soto, Juan Carlos Niebles
2018 Lecture Notes in Computer Science  
Traditional video understanding tasks include human action recognition and actor/object semantic segmentation.  ...  Our model effectively leverages multiple input modalities, contextual information, and multitask learning in the video to directly output semantic segmentations in a single unified framework.  ...  This work is also partially funded by the Millennium Institute for Foundational Research on Data. We also thank NVIDIA for their DGX-1 donation.  ... 
doi:10.1007/978-3-030-01225-0_43 fatcat:ysv6tp64x5ajhjey66ds7zss2q

Future Segmentation Using 3D Structure [article]

Suhani Vora, Reza Mahjourian, Soeren Pirk, Anelia Angelova
2018 arXiv   pre-print
Ultimately, we observe that leveraging 3D structure in the model facilitates successful prediction, achieving state of the art accuracy in future semantic segmentation.  ...  Working towards this capability, we address the task of predicting future frame segmentation from a stream of monocular video by leveraging the 3D structure of the scene.  ...  We further plan to extend this work and demonstrate its effectiveness, by predicting future events for better motion planning, e.g. in the context of human-robot interaction.  ... 
arXiv:1811.11358v1 fatcat:tbjpzays6fchznx6kekjcd3gyu

Geometric Context from Videos

S. Hussain Raza, Matthias Grundmann, Irfan Essa
2013 2013 IEEE Conference on Computer Vision and Pattern Recognition  
Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and  ...  motion features.  ...  . , mostly static with limited foreground motion, and requires basic camera priors.  ... 
doi:10.1109/cvpr.2013.396 dblp:conf/cvpr/RazaGE13 fatcat:p5qbj4mkivbzjayzuchwvusohi

Human action segmentation with hierarchical supervoxel consistency

Jiasen Lu, Ran Xu, Jason J. Corso
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper, we take a step in that direction and propose a hierarchical MRF model to bridge low-level video fragments with high-level human motion and appearance; novel higher-order potentials connect  ...  Our single layer model significantly outperforms the current state-of-the-art on actionness, and our full model improves upon the single layer baselines in action segmentation.  ...  Human Motion Saliency for Human Action Segmentation Our approach inputs a video clip containing human action and outputs a space-time segmentation that labels all the human-action pixels as foreground  ... 
doi:10.1109/cvpr.2015.7299000 dblp:conf/cvpr/LuXC15 fatcat:slg3lhiiabeubdk6rky52zv6ha

Procedural Generation of Videos to Train Deep Action Recognition Networks

Cesar Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel Lopez
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections.  ...  We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos".  ...  Instead of leveraging prior structural knowledge about physics and human actions, the authors view videos as tensors of pixel values and learn a two-stream GAN on 5, 000 hours of unlabeled Flickr videos  ... 
doi:10.1109/cvpr.2017.278 dblp:conf/cvpr/SouzaGCP17 fatcat:w3frbltuh5ecbn67gsrhw33vji

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics [article]

Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo
2021 arXiv   pre-print
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures.  ...  We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation.  ...  This work was supported, in part, by the DARPA Machine Common Sense grant.  ... 
arXiv:2007.12287v3 fatcat:vgwbijprbvenjjnura3yvohtgu

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Suyog Dutt Jain, Bo Xiong, Kristen Grauman
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We propose an end-to-end learning framework for segmenting generic objects in videos.  ...  Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art results for segmenting generic (unseen) objects.  ...  Acknowledgements: This research is supported in part by ONR YIP N00014-12-1-0754.  ... 
doi:10.1109/cvpr.2017.228 dblp:conf/cvpr/JainXG17 fatcat:zbvjxxwj65abldg5bnodcu4cle

TexturePose: Supervising Human Mesh Estimation with Texture Consistency [article]

Georgios Pavlakos, Nikos Kolotouros, Kostas Daniilidis
2019 arXiv   pre-print
In this work, we advocate that there are more cues we can leverage, which are available for free in natural images, i.e., without getting more annotations, or modifying the network architecture.  ...  This makes our proposed supervision applicable in a variety of settings, ranging from monocular video, to multi-view images.  ...  Finally, to put our work in a greater context, the idea of appearance constancy is popular also beyond human pose estimation, e.g., in approaches for unsupervised learning of depth, ego-motion and optical  ... 
arXiv:1910.11322v1 fatcat:d6i2llmnzbfxplv4j7nym3ll54

Learning-based heart rate detection from remote photoplethysmography features

YungChien Hsu, Yen-Liang Lin, Winston Hsu
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we argue for treating this emerging problem in a novel aspect -proposing a learning-based framework to accommodate multiple and temporal feature and yielding significant and robust improvement  ...  With proposed novel multiple feature fusion and multiple segment fusion techniques, we achieved the best estimation result with RMSE 5.48 and CC 0.88.  ...  The experiment achieves significant improvements over consumer videos in the ambient light environment.  ... 
doi:10.1109/icassp.2014.6854440 dblp:conf/icassp/HsuLH14 fatcat:y6mfql7ssrdtxe6qs3vjfbbhly

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos [article]

Bo Xiong, Suyog Dutt Jain, Kristen Grauman
2018 arXiv   pre-print
We propose an end-to-end learning framework for segmenting generic objects in both images and videos.  ...  Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of  ...  The authors thank the reviewers for their valuable suggestions.  ... 
arXiv:1808.04702v2 fatcat:jvin6gvjwndehjcbz3n336e5f4

Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos

2018 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We propose an end-to-end learning framework for segmenting generic objects in both images and videos.  ...  Through experiments on multiple challenging image and video segmentation benchmarks, our method offers consistently strong results and improves the state-of-the-art for fully automatic segmentation of  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The authors thank the reviewers for their suggestions.  ... 
doi:10.1109/tpami.2018.2865794 pmid:30130176 fatcat:nmx3kdvw7vcslfgb623lwc4ose

One-Shot Learning with Pseudo-Labeling for Cattle Video Segmentation in Smart Livestock Farming

Yongliang Qiao, Tengfei Xue, He Kong, Cameron Clark, Sabrina Lomax, Khalid Rafique, Salah Sukkarieh
2022 Animals  
Then, PL leverages the segmentation results of the Xception-FCN model to fine-tune the model, leading to performance boosts in cattle video segmentation.  ...  In order to reduce the reliance on the number of labeled images, one-shot learning with a pseudo-labeling approach is proposed using only one labeled image frame to segment animals in videos.  ...  Acknowledgments: The authors also express their gratitude to Javier Martinez, Amanda Doughty, Ashraful Islam and Mike Reynolds for their help in experiment organization and data collection.  ... 
doi:10.3390/ani12050558 pmid:35268130 pmcid:PMC8908826 fatcat:it4mllkwp5bv5mzcxv42wf7goe

Joint Learning of Object and Action Detectors

Vicky Kalogeiton, Philippe Weinzaepfel, Vittorio Ferrari, Cordelia Schmid
2017 2017 IEEE International Conference on Computer Vision (ICCV)  
While most existing approaches for detection in videos focus on objects or human actions separately, we aim at jointly detecting objects performing actions, such as cat eating or dog jumping.  ...  In experiments on the A2D dataset [50], we obtain state-of-the-art results on segmentation of object-action pairs.  ...  We gratefully acknowledge the support of NVIDIA with the donation of GPUs used for this research.  ... 
doi:10.1109/iccv.2017.219 dblp:conf/iccv/KalogeitonWFS17 fatcat:zhjxvt6unnfedipyg4b75yjfla
« Previous Showing results 1 — 15 out of 10,576 results