A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multi-modal Egocentric Activity Recognition using Audio-Visual Features
[article]
2019
arXiv
pre-print
In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). ...
The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods. ...
CONCLUSION In this work, we proposed a new framework for egocentric activity recognition problem based on audio-visual features combined with multi-kernel learning classification. ...
arXiv:1807.00612v2
fatcat:6bdk35purrfgnlraheavbbuodi
Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition
2016
2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. ...
First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. ...
Furthermore, they also propose to apply Fisher Kernel framework in order to fuse sensor and video features for multimodal egocentric activity recognition. ...
doi:10.1109/cvprw.2016.54
dblp:conf/cvpr/SongCMLLBSC16
fatcat:myz44dahlbavvio5syy4nlopby
Robust object recognition in RGB-D egocentric videos based on Sparse Affine Hull Kernel
2015
2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
In this paper, we propose a novel kernel function for recognizing objects in RGB-D egocentric videos. ...
Our kernel function also allows convenient integration of heterogeneous data modalities beyond RGB and depth. ...
The proposed kernel function allows convenient integration of heterogeneous data modalities (RGB, depth, infrared, etc.) under the Multiple Kernel Learning (MKL) [13] framework. ...
doi:10.1109/cvprw.2015.7301302
dblp:conf/cvpr/WanA15
fatcat:wyub2j6o55hmhgqk5cehgrpiy4
Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos
[article]
2021
arXiv
pre-print
We further introduce a multi-modal fusion mechanism that captures the pairwise interactions between RGB, flow, and object modalities. ...
In this work, we propose a simple and effective multi-modal architecture based on temporal convolutions. ...
We use the same starting learning rates for the uni-modal branches, while for the fusion layers we increase the learning rate to 0.00075. ...
arXiv:2107.09504v1
fatcat:cqbtqneuvvhjtg2wn26otndepi
Trear: Transformer-based RGB-D Egocentric Action Recognition
[article]
2021
arXiv
pre-print
In this paper, we propose a Transformer-based RGB-D egocentric action recognition framework, called Trear. ...
Instead of using optical flow or recurrent units, we adopt self-attention mechanism to model the temporal structure of the data from different modalities. ...
This paper focuses on RGB-D based egocentric action recognition and explores a novel framework to learn a conjoint representation of both modalities. ...
arXiv:2101.03904v1
fatcat:7bidz3i5djfrnnvyih2q4ja52u
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset
2020
IEEE Transactions on Software Engineering
learning. ...
EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. ...
Multi-modal data can be useful for tasks like multi-party speech recognition and predicting turn-taking by combining granularity of verbal and non-verbal cues (Stratou and Morency, 2017; Picard, 2000) ...
doi:10.1109/tpami.2020.3025105
pmid:32946385
fatcat:7nsepwlm6ng5bfuv6zxyiduuku
Vision and Acceleration Modalities: Partners for Recognizing Complex Activities
2019
2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)
In our work, we aim to overcome this limitation and present a multi-modal egocentricbased activity recognition approach which is able to recognize the critical activities by looking at movement and object ...
Wearable devices have been used widely for human activity recognition in the field of pervasive computing. ...
Multi-modal activity recognition Previous work has combined multiple sensors to create and analyze multi-modal datasets [14] , [24] . ...
doi:10.1109/percomw.2019.8730690
dblp:conf/percom/DieteSS19
fatcat:5o2ldmtx5fdz3klwlzu7e6hpx4
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset
2020
IEEE Transactions on Pattern Analysis and Machine Intelligence
learning. ...
EgoCom is a first-of-its-kind natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives. ...
Multi-modal data can be useful for tasks like multi-party speech recognition and predicting turn-taking by combining granularity of verbal and non-verbal cues (Stratou and Morency, 2017; Picard, 2000) ...
doi:10.1109/tpami.2020.3025105
fatcat:pxgvb6i3uvc5jlpubqs2ieali4
Egocentric Action Recognition by Video Attention and Temporal Context
[article]
2020
arXiv
pre-print
Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data. ...
We further introduce a simple yet effective contextual learning mechanism to model 'action' class scores directly from long-term temporal behaviour based on the 'verb' and 'noun' prediction scores. ...
In this attempt, we present a novel egocentric action recognition solution based on video attention learning and temporal contextual learning jointly. ...
arXiv:2007.01883v1
fatcat:sjne7saj6jemvjnlcn7xqefzke
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals
2017
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
We use heart rate signals as privileged self-supervision to derive energy expenditure in a training stage. A multitask objective is used to jointly optimize the two tasks. ...
Physiological signals such as heart rate can provide valuable information about an individual's state and activity. ...
Here, we focus on activity recognition using egocentric video and wearable sensors. ...
doi:10.1109/cvpr.2017.721
dblp:conf/cvpr/NakamuraYAF17
fatcat:zlqt4iaj2nc2jivjda2nafmkkm
Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
2021
IEEE Sensors Journal
Our framework is trained to learn a representation for multi-task learning: gesture segmentation and gesture recognition. ...
Existing multi-modal gesture recognition systems take multi-modal data as input to improve accuracy, but such methods require more modality sensors, which will greatly limit their application scenarios ...
EgoGesture dataset is a large-scale egocentric hand gesture recognition dataset designed for VR/AR use cases. ...
doi:10.1109/jsen.2021.3123443
fatcat:4biyoph3xbe6dksji53pzpcc6i
Multitask Learning to Improve Egocentric Action Recognition
[article]
2019
arXiv
pre-print
We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks. ...
Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%. ...
In our work, we do not use an explicit feature-based representation of modalities for the input, but use data from these modalities as supervision to learn actions. ...
arXiv:1909.06761v1
fatcat:3hbhxpzatnb6zlbsxtcaroaofa
Recognition of Activities of Daily Living with Egocentric Vision: A Review
2016
Sensors
Video-based recognition of activities of daily living (ADLs) is being used in ambient assisted living systems in order to support the independent living of older people. ...
This paper presents a review of the state of the art of egocentric vision systems for the recognition of ADLs following a hierarchical structure: motion, action and activity levels, where each level provides ...
Using unsupervised learning, multi-task clustering, i.e., learning multiple tasks simultaneously, has been demonstrated to give better results for action recognition in egocentric vision with respect to ...
doi:10.3390/s16010072
pmid:26751452
pmcid:PMC4732105
fatcat:okm2fswkjrdzleelae46u3nfna
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
[article]
2022
arXiv
pre-print
Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. ...
We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. ...
But these instructional videos are curated from online sources; they are produced, have multiple shots, and primarily target multi-modal (vision + NLP) learning [40, 50, 52 ]. ...
arXiv:2203.14712v2
fatcat:ruqwsqjwuramhdso3b7byfszum
LSTA: Long Short-Term Attention for Egocentric Action Recognition
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. ...
We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state-of-the-art performance on four standard benchmarks. ...
We gratefully acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. ...
doi:10.1109/cvpr.2019.01019
dblp:conf/cvpr/SudhakaranEL19
fatcat:numtqwnpdjgijhao3s4e7optii
« Previous
Showing results 1 — 15 out of 386 results