Filters








12,098 Hits in 5.7 sec

SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera

Denis Tome, Thiemo Alldieck, Patrick Peluse, Gerard Pons-Moll, Lourdes Agapito, Hernan Badino, Fernando De la Torre
2020 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We propose an encoder-decoder architecture with a novel multi-branch decoder designed to account for the varying uncertainty in 2D predictions.  ...  Moreover, an evaluation on the Human3.6M benchmark shows that the performance of our method is on par with top performing approaches on the more classic problem of 3D human pose from a third person viewpoint  ...  First Person 3D Human Pose Estimation.  ... 
doi:10.1109/tpami.2020.3029700 pmid:33031034 fatcat:3bk7iu2sbndsxb46immohkktta

Fine-grained Human Analysis under Occlusions and Perspective Constraints in Multimedia Surveillance

Rita Cucchiara, Matteo Fabbri
2022 ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  
Dealing with occlusion can be done at the joint level or pixel level: We discuss two different solutions, the former based on a supervised neural network architecture for detecting occluded joints and  ...  More specifically, we discuss some issues and some possible solutions to effectively detect people using pose estimation methods and to detect humans under occlusions both in the two-dimensional (2D) image  ...  The Figure is taken from "Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation" [15]. Fig. 12 . 12 Fig. 12. GUI of our system.  ... 
doi:10.1145/3476839 fatcat:wvwomcxyrfcxragamhbt3dk6ty

Visual Methods for Sign Language Recognition: A Modality-Based Review [article]

Bassem Seddik, Najoua Essoukri Ben Amara
2020 arXiv   pre-print
Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances.  ...  Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields.  ...  For nearly 466 million persons world-wide, SL is their first native language.  ... 
arXiv:2009.10370v1 fatcat:jkqtzid6qndhnijs5axhfom4ia

A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video [article]

Junfa Liu, Juan Rojas, Zhijun Liang, Yihui Li, Yisheng Guan
2020 arXiv   pre-print
, and achieves competitive performance on 2D-to-3D video pose estimation.  ...  To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences.  ...  Experiments show that our top-down video pose estimation achieves 11 fps for a single person with the same test environment. VI.  ... 
arXiv:2003.14179v4 fatcat:5yg3uimk5jadlficc7wuvyzy6a

Real-Time Online Skeleton Extraction and Gesture Recognition on Pepper [article]

Axel Lefrant, Jean-Marc Montanier
2022 arXiv   pre-print
We present a multi-stage pipeline for simple gesture recognition.  ...  For this task, Pepper has been augmented with an embedded GPU for running deep CNNs and a fish-eye camera to capture whole scene interaction.  ...  We also thank Alexandre Mazel, the director of software innovation of Softbank Robotics Europe, for his technical knowledge and support on Pepper and the Jetson TX2.  ... 
arXiv:2206.11376v1 fatcat:cv6av45t7rhsbhbtgiippip4du

Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things [article]

Jing Zhang, Dacheng Tao
2020 arXiv   pre-print
Specifically, we briefly present the AIoT architecture in the context of cloud computing, fog computing, and edge computing.  ...  Then, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving.  ...  For example, the person re-identification model can be used for initial proposal ranking and filtering, then human experts are involved to make final decisions. 8) Human Pose Estimation and Gesture/Action  ... 
arXiv:2011.08612v1 fatcat:dflut2wdrjb4xojll34c7daol4

Estimating the Lecturer's Head Pose in Seminar Scenarios – A Multi-view Approach [chapter]

Michael Voit, Kai Nickel, Rainer Stiefelhagen
2006 Lecture Notes in Computer Science  
In 92% of the time, the correct pose class or a neighbouring pose class (i.e. a 45 degree error) were estimated.  ...  Using the proposed fully automatic system we are able to correctly determine the lecturer's head pose in 59% of the time and for 8 orientation classes.  ...  Neural networks were implemented for estimating the head pose seen by each camera. A maximum-likelihood search results in the final pose hypothesis.  ... 
doi:10.1007/11677482_20 fatcat:tm36uayqpzhq7e7mnvtrwie5vy

A Bayesian Approach for Multi-view Head Pose Estimation

Michael Voit, Kai Nickel, Rainer Stiefelhagen
2006 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems  
In this paper, we present a system for estimating human head pose with the use of multiple camera views.  ...  We apply a neural network to each of the views, and fuse the output using a Bayesian filter framework. Thus, we achieve a more robust estimation compared to pure monocular approaches.  ...  There, also neural networks were implemented for estimating the head pose seen by each single camera. A maximum-likelihood search then results in the final pose hypothesis.  ... 
doi:10.1109/mfi.2006.265627 dblp:conf/mfi/VoitNS06 fatcat:5hkimftwgrefhal4ryfux4rz6u

On the role of depth predictions for 3D human pose estimation [article]

Alec Diaz-Arias, Mitchell Messmore, Dmitriy Shin, Stephen Baek
2021 arXiv   pre-print
Following the successful application of deep convolutional neural networks to 2d human pose estimation, the next logical problem to solve is 3d human pose estimation from monocular images.  ...  Furthermore, our system can be combined with an off-the-shelf 2d pose detector and a depth map predictor to perform 3d pose estimation in the wild.  ...  Figure 1 : 1 The proposed 3d pose estimation network architecture.  ... 
arXiv:2103.02521v1 fatcat:cuxfvctbavfjvh7oojjhmogt54

HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation [article]

Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu
2021 arXiv   pre-print
Estimating 3D human pose from a single image is a challenging task.  ...  Leveraging the strength of the HEMlets pose estimation, we further design and append a shallow yet effective network module to regress the SMPL parameters of the body pose and shape.  ...  Then, we elaborate a simple network architecture that utilizes the part-centric heatmap triplets for 3D human pose estimation.  ... 
arXiv:2003.04894v3 fatcat:7xpvvvzh2zfizkqb7c7yx7ohya

Orientation Keypoints for 6D Human Pose Estimation [article]

Martin Fisch, Ronald Clark
2021 arXiv   pre-print
Most realtime human pose estimation approaches are based on detecting joint positions. Using the detected joint positions, the yaw and pitch of the limbs can be computed.  ...  In this paper we therefore introduce orientation keypoints, a novel approach for estimating the full position and rotation of skeletal joints, using only single-frame RGB images.  ...  Stacked hourglass 3d human pose learning via multi-view images in the wild. In networks for human pose estimation.  ... 
arXiv:2009.04930v2 fatcat:bjahcsglunasxdahyitwouwanq

Human Pose Estimation from Monocular Images: A Comprehensive Survey

Wenjuan Gong, Xuena Zhang, Jordi Gonzàlez, Andrews Sobral, Thierry Bouwmans, Changhe Tu, El-hadi Zahzah
2016 Sensors  
In Section 5, we collect publicly-available datasets for the validation of human pose estimation algorithms, several error measurement methods, and a toolkit for non-expert users to use human pose estimation  ...  for low-level estimation or if human poses are recognized from pixel-level image evidence.  ...  The convolutional network architecture used in [156] .  ... 
doi:10.3390/s16121966 pmid:27898003 pmcid:PMC5190962 fatcat:jigvz4ovpbh63eovto3etoefx4

Adding Pluggable and Personalized Natural Control Capabilities to Existing Applications

Fabrizio Lamberti, Andrea Sanna, Gilles Carlevaris, Claudio Demartini
2015 Sensors  
Advancements in input device and sensor technologies led to the evolution of the traditional human-machine interaction paradigm based on the mouse and keyboard.  ...  In this paper, a framework designed to transparently add multi-modal interaction capabilities to applications to which users are accustomed is presented.  ...  Thus, in this paper, a pluggable solution for improving the multi-modality of existing applications by adding personalized gesture-and voice-based control capabilities is presented.  ... 
doi:10.3390/s150202832 pmid:25635410 pmcid:PMC4367336 fatcat:r6xpblvlubende6mtspvoyozuu

Survey on Emotional Body Gesture Recognition

Fatemeh Noroozi, Dorota Kaminska, Ciprian Corneanu, Tomasz Sapinski, Sergio Escalera, Gholamreza Anbarjafari
2019 IEEE Transactions on Affective Computing  
While pre-processing methodologies (e.g. human detection and pose estimation) are nowadays mature technologies fully developed for robust large scale analysis, we show that for emotion recognition the  ...  We then define a complete framework for automatic emotional body gesture recognition. We introduce person detection and comment static and dynamic body pose estimation methods both in RGB and 3D.  ...  Examples of single-person pose estimation (a) [39] , (b) [40] , (c) [41] , (d) [42] , multi-person pose estimation (e) [43] , multi-person pose estimation and tracking (f) [44] and 3D shape reconstruction  ... 
doi:10.1109/taffc.2018.2874986 fatcat:zjnr2w4orje7vj2bhmia4f5qki

A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training

Andrea Ferracani, Daniele Pezzatini, Alberto Del Bimbo
2014 Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14  
By leveraging big data from billions of search queries, billions of images on the web and from the social networks, and billions of user clicks, we have designed massive machine learning systems to continuously  ...  With the focus on natural language and entity understanding, for instance, we have improved Bing's ability to understand the user intent beyond queries and keywords.  ...  Retrieval Based on Local Similarity with Multiple Images Convolutional Network Features for Scene Recognition Perceived Audio Quality for Streaming Stereo Music Discriminating Native from Non-native Speech  ... 
doi:10.1145/2656719.2656725 dblp:conf/mm/FerracaniPB14a fatcat:obsb2i4iybhu3dq77hujvjtbze
« Previous Showing results 1 — 15 out of 12,098 results