Filters








386 Hits in 3.7 sec

Multi-modal Egocentric Activity Recognition using Audio-Visual Features [article]

Mehmet Ali Arabacı, Fatih Özkan, Elif Surer, Peter Jančovič, Alptekin Temizel
2019 arXiv   pre-print
In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost).  ...  The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods.  ...  CONCLUSION In this work, we proposed a new framework for egocentric activity recognition problem based on audio-visual features combined with multi-kernel learning classification.  ... 
arXiv:1807.00612v2 fatcat:6bdk35purrfgnlraheavbbuodi

Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, Ngai-Man Cheung
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data.  ...  First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos.  ...  Furthermore, they also propose to apply Fisher Kernel framework in order to fuse sensor and video features for multimodal egocentric activity recognition.  ... 
doi:10.1109/cvprw.2016.54 dblp:conf/cvpr/SongCMLLBSC16 fatcat:myz44dahlbavvio5syy4nlopby

Robust object recognition in RGB-D egocentric videos based on Sparse Affine Hull Kernel

Shaohua Wan, J.K. Aggarwal
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
In this paper, we propose a novel kernel function for recognizing objects in RGB-D egocentric videos.  ...  Our kernel function also allows convenient integration of heterogeneous data modalities beyond RGB and depth.  ...  The proposed kernel function allows convenient integration of heterogeneous data modalities (RGB, depth, infrared, etc.) under the Multiple Kernel Learning (MKL) [13] framework.  ... 
doi:10.1109/cvprw.2015.7301302 dblp:conf/cvpr/WanA15 fatcat:wyub2j6o55hmhgqk5cehgrpiy4

Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos [article]

Olga Zatsarynna, Yazan Abu Farha, Juergen Gall
2021 arXiv   pre-print
We further introduce a multi-modal fusion mechanism that captures the pairwise interactions between RGB, flow, and object modalities.  ...  In this work, we propose a simple and effective multi-modal architecture based on temporal convolutions.  ...  We use the same starting learning rates for the uni-modal branches, while for the fusion layers we increase the learning rate to 0.00075.  ... 
arXiv:2107.09504v1 fatcat:cqbtqneuvvhjtg2wn26otndepi

Trear: Transformer-based RGB-D Egocentric Action Recognition [article]

Xiangyu Li and Yonghong Hou and Pichao Wang and Zhimin Gao and Mingliang Xu and Wanqing Li
2021 arXiv   pre-print
In this paper, we propose a Transformer-based RGB-D egocentric action recognition framework, called Trear.  ...  Instead of using optical flow or recurrent units, we adopt self-attention mechanism to model the temporal structure of the data from different modalities.  ...  This paper focuses on RGB-D based egocentric action recognition and explores a novel framework to learn a conjoint representation of both modalities.  ... 
arXiv:2101.03904v1 fatcat:7bidz3i5djfrnnvyih2q4ja52u

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset

Curtis Northcutt, Shengxin Zha, Steven Lovegrove, Richard Newcombe
2020 IEEE Transactions on Software Engineering  
learning.  ...  EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.  ...  Multi-modal data can be useful for tasks like multi-party speech recognition and predicting turn-taking by combining granularity of verbal and non-verbal cues (Stratou and Morency, 2017; Picard, 2000)  ... 
doi:10.1109/tpami.2020.3025105 pmid:32946385 fatcat:7nsepwlm6ng5bfuv6zxyiduuku

Vision and Acceleration Modalities: Partners for Recognizing Complex Activities

Alexander Diete, Timo Sztyler, Heiner Stuckenschmidt
2019 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)  
In our work, we aim to overcome this limitation and present a multi-modal egocentricbased activity recognition approach which is able to recognize the critical activities by looking at movement and object  ...  Wearable devices have been used widely for human activity recognition in the field of pervasive computing.  ...  Multi-modal activity recognition Previous work has combined multiple sensors to create and analyze multi-modal datasets [14] , [24] .  ... 
doi:10.1109/percomw.2019.8730690 dblp:conf/percom/DieteSS19 fatcat:5o2ldmtx5fdz3klwlzu7e6hpx4

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset

Curtis Northcutt, Shengxin Zha, Steven Lovegrove, Richard Newcombe
2020 IEEE Transactions on Pattern Analysis and Machine Intelligence  
learning.  ...  EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.  ...  Multi-modal data can be useful for tasks like multi-party speech recognition and predicting turn-taking by combining granularity of verbal and non-verbal cues (Stratou and Morency, 2017; Picard, 2000)  ... 
doi:10.1109/tpami.2020.3025105 fatcat:pxgvb6i3uvc5jlpubqs2ieali4

Egocentric Action Recognition by Video Attention and Temporal Context [article]

Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang
2020 arXiv   pre-print
Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data.  ...  We further introduce a simple yet effective contextual learning mechanism to model 'action' class scores directly from long-term temporal behaviour based on the 'verb' and 'noun' prediction scores.  ...  In this attempt, we present a novel egocentric action recognition solution based on video attention learning and temporal contextual learning jointly.  ... 
arXiv:2007.01883v1 fatcat:sjne7saj6jemvjnlcn7xqefzke

Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals

Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, Li Fei-Fei
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We use heart rate signals as privileged self-supervision to derive energy expenditure in a training stage. A multitask objective is used to jointly optimize the two tasks.  ...  Physiological signals such as heart rate can provide valuable information about an individual's state and activity.  ...  Here, we focus on activity recognition using egocentric video and wearable sensors.  ... 
doi:10.1109/cvpr.2017.721 dblp:conf/cvpr/NakamuraYAF17 fatcat:zlqt4iaj2nc2jivjda2nafmkkm

Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition

Dinghao Fan, Hengjie Lu, Shugong Xu, Shan Cao
2021 IEEE Sensors Journal  
Our framework is trained to learn a representation for multi-task learning: gesture segmentation and gesture recognition.  ...  Existing multi-modal gesture recognition systems take multi-modal data as input to improve accuracy, but such methods require more modality sensors, which will greatly limit their application scenarios  ...  EgoGesture dataset is a large-scale egocentric hand gesture recognition dataset designed for VR/AR use cases.  ... 
doi:10.1109/jsen.2021.3123443 fatcat:4biyoph3xbe6dksji53pzpcc6i

Multitask Learning to Improve Egocentric Action Recognition [article]

Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp
2019 arXiv   pre-print
We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks.  ...  Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%.  ...  In our work, we do not use an explicit feature-based representation of modalities for the input, but use data from these modalities as supervision to learn actions.  ... 
arXiv:1909.06761v1 fatcat:3hbhxpzatnb6zlbsxtcaroaofa

Recognition of Activities of Daily Living with Egocentric Vision: A Review

Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta
2016 Sensors  
Video-based recognition of activities of daily living (ADLs) is being used in ambient assisted living systems in order to support the independent living of older people.  ...  This paper presents a review of the state of the art of egocentric vision systems for the recognition of ADLs following a hierarchical structure: motion, action and activity levels, where each level provides  ...  Using unsupervised learning, multi-task clustering, i.e., learning multiple tasks simultaneously, has been demonstrated to give better results for action recognition in egocentric vision with respect to  ... 
doi:10.3390/s16010072 pmid:26751452 pmcid:PMC4732105 fatcat:okm2fswkjrdzleelae46u3nfna

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities [article]

Fadime Sener and Dibyadip Chatterjee and Daniel Shelepov and Kun He and Dipika Singhania and Robert Wang and Angela Yao
2022 arXiv   pre-print
Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings.  ...  We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes.  ...  But these instructional videos are curated from online sources; they are produced, have multiple shots, and primarily target multi-modal (vision + NLP) learning [40, 50, 52 ].  ... 
arXiv:2203.14712v2 fatcat:ruqwsqjwuramhdso3b7byfszum

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation.  ...  We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state-of-the-art performance on four standard benchmarks.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research.  ... 
doi:10.1109/cvpr.2019.01019 dblp:conf/cvpr/SudhakaranEL19 fatcat:numtqwnpdjgijhao3s4e7optii
« Previous Showing results 1 — 15 out of 386 results