Filters








2,759 Hits in 8.1 sec

Two-person interaction recognition via spatial multiple instance embedding

Fadime Sener, Nazli Ikizler-Cinbis
2015 Journal of Visual Communication and Image Representation  
Experimental results on two benchmark datasets validate that using two-person visual descriptors together with spatial multiple instance learning offers an effective way for inferring the type of the interaction  ...  Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance learning framework.  ...  Conclusion In this study, we propose a multiple instance learning (MIL) based approach for two-person interaction recognition in videos.  ... 
doi:10.1016/j.jvcir.2015.07.016 fatcat:hq6dq2hs55gdjkv6hc4onqxzt4

Domain-Adaptive Discriminative One-Shot Learning of Gestures [chapter]

Tomas Pfister, James Charles, Andrew Zisserman
2014 Lecture Notes in Computer Science  
The objective of this paper is to recognize gestures in videos -both localizing the gesture and classifying it into one of multiple classes.  ...  The domain adaptation and learning methods are evaluated on two large scale challenging gesture datasets: one for sign language, and the other for Italian hand gestures.  ...  Acknowledgements: We are grateful to Patrick Buehler and Sophia Pfister for help and discussions. Financial support was provided by Osk. Huttunen Foundation and EPSRC grant EP/I012001/1.  ... 
doi:10.1007/978-3-319-10599-4_52 fatcat:dosr662razhmxgbxotpuiprgqe

Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos

Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
Video-based person re-identification approaches have gained significant attention recently, expanding image-based approaches by learning features from multiple frames.  ...  In this work, we propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.  ...  Acknowledgments Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  ... 
doi:10.24963/ijcai.2020/141 dblp:conf/ijcai/LiuZZJ20 fatcat:v5copcdxlraw5p5jdkstvyzqri

Co-Saliency Spatio-Temporal Interaction Network for Person Re-Identification in Videos [article]

Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang
2020 arXiv   pre-print
Video-based re-identification approaches have gained significant attention recently, expanding image-based approaches by learning features from multiple frames.  ...  In this work, we propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.  ...  Early works on video-based person re-identification focus on hand-crafted video representations and/or distance metric learning. Recent approaches are mostly based on deep learning techniques.  ... 
arXiv:2004.04979v2 fatcat:qs43hpej6jdtzkobtkfgfqia7m

Deep Heterogeneous Feature Fusion for Template-Based Face Recognition [article]

Navaneeth Bodla, Jingxiao Zheng, Hongyu Xu, Jun-Cheng Chen, Carlos Castillo, Rama Chellappa
2017 arXiv   pre-print
template-based face recognition, where a template refers to a set of still face images or video frames from different sources which introduces more blur, pose, illumination and other variations than traditional  ...  Although deep learning has yielded impressive performance for face recognition, many studies have shown that different networks learn different feature maps: while some networks are more receptive to pose  ...  This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2014-14071600012  ... 
arXiv:1702.04471v1 fatcat:at53lndktnamvomrccpj67pgsa

Image-Set Based Face Recognition Using Boosted Global and Local Principal Angles [chapter]

Xi Li, Kazuhiro Fukui, Nanning Zheng
2010 Lecture Notes in Computer Science  
Inspired by the work of [4, 14] , this paper presents a robust framework for image-set based face recognition using boosted global and local principal angles.  ...  The discriminative power of each principal angle for the global and each local sub-pattern is explicitly exploited by learning a strong classifier in a boosting manner.  ...  Generally, for a subspace base pairs U A , U B which have rank of k, there exist k principal angles.  ... 
doi:10.1007/978-3-642-12307-8_30 fatcat:wbo26vnzpnhora6lhzf5pvhylu

High-level event recognition in unconstrained videos

Yu-Gang Jiang, Subhabrata Bhattacharya, Shih-Fu Chang, Mubarak Shah
2012 International Journal of Multimedia Information Retrieval  
In this paper, we review current technologies for complex event recognition in unconstrained videos.  ...  The goal of high-level event recognition is to automatically detect complex high-level events in a given video sequence.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.  ... 
doi:10.1007/s13735-012-0024-2 fatcat:mfzttic3svb4tho2xb6aczgp4y

Action recognition in real-world videos [article]

Waqas Sultani, Qazi Ammar Arshad, Chen Chen
2020 arXiv   pre-print
The goal of human action recognition is to temporally or spatially localize the human action of interest in video sequences.  ...  In this chapter, we are using action, activity, event interchangeably.  ...  Weakly supervised anomaly detection algorithms in [31] developed multiple instance ranking loss for criminal activity detection in surveillance videos.  ... 
arXiv:2004.10774v1 fatcat:asnrp2z6mvfnlh46w5idj4rogm

Large-scale multimodal semantic concept detection for consumer video

Shih-Fu Chang, Dan Ellis, Wei Jiang, Keansub Lee, Akira Yanagawa, Alexander C. Loui, Jiebo Luo
2007 Proceedings of the international workshop on Workshop on multimedia information retrieval - MIR '07  
To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a large-scale ontology and realistic video corpus.  ...  In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies  ...  Then the ensemble kernel is directly used for learning a one-vs.  ... 
doi:10.1145/1290082.1290118 dblp:conf/mir/ChangEJLYLL07 fatcat:dx5ro37fofgppdlvnxeijygmse

Similar Gesture Recognition using Hierarchical Classification Approach in RGB Videos

Di Wu, Nabin Sharma, Michael Blumenstein
2018 2018 Digital Image Computing: Techniques and Applications (DICTA)  
The challenges and complexity involved in developing a video-based human action recognition system are manifold.  ...  Recognizing human actions from the video streams has become one of the very popular research areas in computer vision and deep learning in the recent years.  ...  In this paper, we discover the use of the CNN models on video-based human action recognition. A simple way to apply the CNN on videos will be in the following steps.  ... 
doi:10.1109/dicta.2018.8615804 dblp:conf/dicta/WuSB18 fatcat:tsc32wtxsfhnriiwt3ebqiiicq

Rank Pooling for Action Recognition

Basura Fernando, Efstratios Gavves, Jose Oramas Oramas M., Amir Ghodrati, Tinne Tuytelaars
2017 IEEE Transactions on Pattern Analysis and Machine Intelligence  
By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition  ...  We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition.  ...  Leuven DBOF PhD fellowship, the FWO project Monitoring of abnormal activity with camera systems and iMinds High-Tech Visualization project.  ... 
doi:10.1109/tpami.2016.2558148 pmid:28278449 fatcat:x6c5hcmqvjahbawejbz6n7arym

Neural Aggregation Network for Video Face Recognition [article]

Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, Gang Hua
2017 arXiv   pre-print
This paper presents a Neural Aggregation Network (NAN) for video face recognition.  ...  The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition.  ...  HL's work was supported in part by Australia ARC Centre of Excellence for Robotic Vision (CE140100016) and by CSIRO Data61.  ... 
arXiv:1603.05474v4 fatcat:z2626u6r3ballourbfoxmzwa6q

A survey of approaches and trends in person re-identification

Apurva Bedagkar-Gala, Shishir K. Shah
2014 Image and Vision Computing  
Given an image/video of a person taken from one camera, re-identification is the process of identifying the person from images/videos taken from a different camera.  ...  Open issues and challenges of the problem are highlighted with a discussion on potential directions for further research.  ...  Person recognition is based on a nearest neighbor classifier.  ... 
doi:10.1016/j.imavis.2014.02.001 fatcat:3w7ju7pzl5gkbgl5djsbakrr7i

Neural Aggregation Network for Video Face Recognition

Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, Gang Hua
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
This paper presents a Neural Aggregation Network (NAN) for video face recognition.  ...  The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition.  ...  HL's work was supported in part by Australia ARC Centre of Excellence for Robotic Vision (CE140100016) and by CSIRO Data61.  ... 
doi:10.1109/cvpr.2017.554 dblp:conf/cvpr/YangRZCWLH17 fatcat:gqalohdicrdv3ozh3sumzq3wze

2020 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 42

2021 IEEE Transactions on Pattern Analysis and Machine Intelligence  
., +, TPAMI Feb. 2020 371-385 Learning Compact Features for Human Activity Recognition Via Probabilis- tic First-Take-All.  ...  Yu, T., +, TPAMI Oct. 2020 2523-2539 Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Force from Motion: Decoding Control Force of Activity in a First-Person Video.  ...  ., +, 2581 -2593 Open-Ended Learning of Latent Topics for 3D Object Recognition. Kasaei, S.H., +, 2567 -2580 Object Detection in Videos by High Quality Object Linking.  ... 
doi:10.1109/tpami.2020.3036557 fatcat:3j6s2l53x5eqxnlsptsgbjeebe
« Previous Showing results 1 — 15 out of 2,759 results