Filters








4,499 Hits in 6.5 sec

Polar Relative Positional Encoding for Video-Language Segmentation

Ke Ning, Lingxi Xie, Fei Wu, Qi Tian
2020 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence  
In this paper, we tackle a challenging task named video-language segmentation.  ...  In this paper, we propose a novel Polar Relative Positional Encoding (PRPE) mechanism that represents spatial relations in a "linguistic" way, i.e., in terms of direction and range.  ...  Conclusions We proposed a novel Polar Relative Positional Encoding mechanism along with a Polar Attention Module for videolanguage segmentation.  ... 
doi:10.24963/ijcai.2020/132 dblp:conf/ijcai/NingXW020 fatcat:mktrb7kgbzcqbgmywwgtrm23my

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation [article]

Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang
2021 arXiv   pre-print
Language-queried video actor segmentation aims to predict the pixel-level mask of the actor which performs the actions described by a natural language query in the target frames.  ...  Existing methods adopt 3D CNNs over the video clip as a general encoder to extract a mixed spatio-temporal feature for the target frame.  ...  PRPE [29] proposes a polar positional encoding method to better localize the actor queried in the video.  ... 
arXiv:2105.06818v1 fatcat:ecnavhtacbaallxotf62ze6ila

A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews [article]

Edison Marrese-Taylor, Cristian Rodriguez-Opazo, Jorge A. Balazs, Stephen Gould, Yutaka Matsuo
2020 arXiv   pre-print
Our approach works at the sentence level without the need for time annotations and uses features derived from the audio, video and language transcriptions of its contents.  ...  In light of this issue, we propose a multi-modal approach for mining fine-grained opinions from video reviews that is able to determine the aspects of the item under review that are being discussed and  ...  Acknowledgments We are grateful for the support provided by the NVIDIA Corporation, donating two of the GPUs used for this research.  ... 
arXiv:2005.13362v2 fatcat:jisgega2uzacnd3qfxanr5ilfe

Recognition of unfamiliar faces: three kinds of effects

Chang Hong Liu, Avi Chaudhuri
2000 Trends in Cognitive Sciences  
where contrast polarity is switched from positive to negative or vice versa between learned and test faces (incongruent stimulus condition).  ...  In this example, the main reason for recognition failure can be explained by the principle of encoding specificity, whereas an object-related factor (contrast polarity) played a less important role.  ... 
doi:10.1016/s1364-6613(00)01558-8 fatcat:wwll5ahcbrf6xizurpuote3bzu

Translating Images into Maps [article]

Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden
2021 arXiv   pre-print
The structure allows us to make efficient use of data when training, and obtains state-of-the-art results for instantaneous mapping of three large-scale datasets, including a 15% and 30% relative gain  ...  We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world, in a single end-to-end network.  ...  Formally, let h ∈ R H×C represent the encoded "memory" of an image column of height H, and let y ∈ R r×C represent a positional query which encodes relative position along a polar ray of length r.  ... 
arXiv:2110.00966v1 fatcat:kpcv4ps6onhspkoeoylje43qc4

Integrating Multimodal Information in Large Pretrained Transformers

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics  
While fine-tuning these pre-trained models is straight-forward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused  ...  In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis.  ...  XLNet utilizes two key ideas from Transformer-XL : relative positioning and segment recurrence mechanism. Like BERT, it also has a Input Embedder followed by multiple Encoders.  ... 
doi:10.18653/v1/2020.acl-main.214 pmid:33782629 pmcid:PMC8005298 fatcat:xh5n4xcxkjhwlnjvbjvu7zxmpy

Integrating Multimodal Information in Large Pretrained Transformers [article]

Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque
2020 arXiv   pre-print
While fine-tuning these pre-trained models is straightforward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused  ...  In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis.  ...  XLNet utilizes two key ideas from Transformer-XL : relative positioning and segment recurrence mechanism. Like BERT, it also has a Input Embedder followed by multiple Encoders.  ... 
arXiv:1908.05787v3 fatcat:rmwtz4xllveafc3wxdupuncpeq

Hybrid Deep Network and Polar Transformation Features for Static Hand Gesture Recognition in Depth Data

Vo Hoai, Tran Thai, Ly Quoc
2016 International Journal of Advanced Computer Science and Applications  
, such as game interaction and sign language recognition.  ...  In this paper, we propose the effective hand segmentation from the full depth image that is important step before extracting the features to represent for hand gesture.  ...  the relative angles and distances between the salient points and the reference point for each gesture.  ... 
doi:10.14569/ijacsa.2016.070536 fatcat:mvdcrw3stndl5ozqv525v73jae

Cross-domain Sentiment Classification with Bidirectional Contextualized Transformer Language Models

Batsergelen Myagmar, Jie Li, Shigetomo Kimura
2019 IEEE Access  
Cross-domain sentiment classification is an important Natural Language Processing (NLP) task that aims at leveraging knowledge obtained from a source domain to train a high-performance learner for sentiment  ...  Our results show that such bidirectional contextualized language models outperform the previous state-of-the-arts methods for crossdomain sentiment classification while using up to 120 times less data.  ...  Relative Positional Encoding encodes position of a context in relative distance from the current token at each attention module, as opposed to encoding position statically only at the beginning like in  ... 
doi:10.1109/access.2019.2952360 fatcat:3qyugxpeqjbcxmiscuuko5rnfm

Given claims about new topics. How Romance and Germanic speakers link changed and maintained information in narrative discourse

Christine Dimroth, Cecilia Andorno, Sandra Benazzo, Josje Verhagen
2010 Journal of Pragmatics  
Acknowledgements We wish to thank Wolfgang Klein, Leah Roberts, Sarah Schimke, Giusy Turco as well as three anonymous reviewers for their helpful comments on an earlier version of this paper.  ...  The video consists of 31 segments.  ...  With respect to positive polarity markers 8 he distinguishes predicational positive polarity (as in 4a) from propositional positive polarity (4b). (4a) A: Is John rich?  ... 
doi:10.1016/j.pragma.2010.05.009 fatcat:xzzvnku3cfdoxhylwprdqw4j7y

Image and Video for Hearing Impaired People

Alice Caplier, Sébastien Stillittano, Oya Aran, Lale Akarun, Gérard Bailly, Denis Beautemps, Nouredine Aboutabit, Thomas Burger
2007 EURASIP Journal on Image and Video Processing  
Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.  ...  Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people.  ...  Figure 10 presents some segmentation results with Eveno's model for the external lip contours. Relatively few studies deal with the problem of inner lip segmentation.  ... 
doi:10.1186/1687-5281-2007-045641 fatcat:5ed7gqgd5jf4db3xhh4p6wjkf4

Image and Video for Hearing Impaired People

Alice Caplier, Sébastien Stillittano, Oya Aran, Lale Akarun, Gérard Bailly, Denis Beautemps, Nouredine Aboutabit, Thomas Burger
2007 EURASIP Journal on Image and Video Processing  
Thirdly, we present the existing tools for reverse communication, from hearing people to deaf people that involve SL and CS video synthesis.  ...  Secondly, we present existing tools which employ SL and CS video processing and recognition for the automatic communication between deaf people and hearing people.  ...  Figure 10 presents some segmentation results with Eveno's model for the external lip contours. Relatively few studies deal with the problem of inner lip segmentation.  ... 
doi:10.1155/2007/45641 fatcat:m4c5ursu75ajnpycne5n5hhm2a

Analysis of Political Sentiment From Twitter Data

Sikha Bagui, Carson Wilber, Kaixin Ren
2020 Natural Language Processing Research  
The novelty of this work is in determining how short corpuses (taken from Twitter data) are polarized to multiple axes with respect to a subject, as opposed to using a single positive-negative sentiment  ...  Various axes will have to be combined for better results. Results were measured in terms of classification accuracy, classification bias, and an axis score.  ...  For WE, an input corpus of tokens is translated by translating each token into a multi-real-valued vector encoding the meaning of the word relative to a selection of words in the English language.  ... 
doi:10.2991/nlpr.d.201013.001 fatcat:x3xxzrue65by7dtpuwtwne5s3u

Reasoning about Body-Parts Relations for Sign Language Recognition [article]

Marc Martínez-Camarena, Jose Oramas, Mario Montagud-Climent and Tinne Tuytelaars
2016 arXiv   pre-print
However, in most sign languages, hand gestures are defined on a particular context (body region).  ...  We propose a pipeline to perform sign language recognition which models hand movements in the context of other parts of the body captured in the 3D space using the MS Kinect sensor.  ...  for sign language recognition.  ... 
arXiv:1607.06356v1 fatcat:hpklejycmnh6ddwayerbhzb67u

Telescopic Vector Composition and Polar Accumulated Motion Residuals for Feature Extraction in Arabic Sign Language Recognition

T Shanableh, K Assaleh
2007 EURASIP Journal on Image and Video Processing  
This work introduces two novel approaches for feature extraction applied to video-based Arabic sign language recognition, namely, motion representation through motion estimation and motion representation  ...  Since in both approaches the temporal dimension of the video-based gesture needs to be preserved, hidden Markov models are used for classification tasks.  ...  Salah Odeh of the Sharjah City for Humanitarian Services (SCHS) and Mr. W. Zouabi and F.  ... 
doi:10.1186/1687-5281-2007-087929 fatcat:aoquxcfmvjbn3mjrso6ybns7gq
« Previous Showing results 1 — 15 out of 4,499 results