875 Hits in 7.7 sec

Fine-grained Video Attractiveness Prediction Using Multimodal Deep Learning on a Large Real-world Dataset

Xinpeng Chen, Jingyuan Chen, Lin Ma, Jian Yao, Wei Liu, Jiebo Luo, Tong Zhang
2018 Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18  
To this end, we construct the first fine-grained video attractiveness dataset, which is collected from one of the most popular video websites in the world.  ...  Second, FVAD provides us an opportunity to study the fine-grained video attractiveness prediction problem.  ...  VIDEO ATTRACTIVENESS PREDICTION USING DEEP LEARNING ON LARGE DATASETS Video attractiveness prediction is a very challenging task, which may involve many external factors.  ... 
doi:10.1145/3184558.3186584 dblp:conf/www/ChenCMYLLZ18 fatcat:7kqxabzlnff3lfzrkjkce3ko5m

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Khaled Bayoudh, Raja Knani, Fayçal Hamdaoui, Abdellatif Mtibaa
2021 The Visual Computer  
In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based  ...  We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains.  ...  The task has attracted a lot of interest because of its enormous relevance in many real-world applications, including video surveillance [82], autonomous driving [83] , etc.  ... 
doi:10.1007/s00371-021-02166-7 pmid:34131356 pmcid:PMC8192112 fatcat:jojwyc6slnevzk7eaiutlmlgfe

A Review on Methods and Applications in Multimodal Deep Learning [article]

Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Jabbar Abdul
2022 arXiv   pre-print
A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth.  ...  The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities.  ...  MMED seeks to determine a collection of real-world events in a large set of social media data.  ... 
arXiv:2202.09195v1 fatcat:wwxrmrwmerfabbenleylwmmj7y

Social Multimedia Sentiment Analysis

Jiebo Luo, Damian Borth, Quanzeng You
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
The final part is mainly on multimodality model for sentiment analysis. We will introduce some recent research projects on multimodality designing and learning.  ...  Researchers from both the industrial and academic have been working on a broad range of projects related to the analyzing and understanding the online multimedia content, including real world activity  ...  In this section, we further discuss the research on fine-grained visual sentiment analysis [3, 7, 18] , which is also known as emotion analysis.  ... 
doi:10.1145/3123266.3130143 dblp:conf/mm/LuoBY17 fatcat:3m53xn4g4vcwlkrzqsjb4jhv2e

Guest Editorial Introduction to the Special Section on Intelligent Visual Content Analysis and Understanding

Hongliang Li, Lu Fang, Tianzhu Zhang
2020 IEEE transactions on circuits and systems for video technology (Print)  
Toward accurate and efficient sport video analysis, "Learning to score figure skating sport videos," by Xu et al., proposes a Large scale figure skating dataset and a semantic representation learning framework  ...  In addition, a new videobased face recognition dataset is introduced for large-scale setbased face recognition.  ... 
doi:10.1109/tcsvt.2020.3031416 fatcat:gpwbmydqbza5lddatxcfcidwcq

Deep Learning for Face Anti-Spoofing: A Survey [article]

Zitong Yu, Yunxiao Qin, Xiaobai Li, Chenxu Zhao, Zhen Lei, Guoying Zhao
2022 arXiv   pre-print
With the emergence of large-scale academic datasets in the recent decade, deep learning based FAS achieves remarkable performance and dominates this area.  ...  In this paper, to stimulate future research, we present the first comprehensive review of recent advances in deep learning based FAS.  ...  Intra-dataset intra-type, Cross-dataset intra-type, Intra-dataset cross-type, Cross-dataset cross-type fine-grained context-aware supervision signals, which is beneficial for deep models learning intrinsic  ... 
arXiv:2106.14948v2 fatcat:wsheo7hbwvewhjoe6ykwjuqfii

Sentiment and Emotion Analysis for Social Multimedia

Quanzeng You
2016 Proceedings of the 2016 ACM on Multimedia Conference - MM '16  
Online social networks have attracted the attention from both the academia and real world.  ...  As an old saying has it, an image is worth a thousand words. The image tweet is a great example of multimodal sentiment.  ...  We have been focusing on a binary classification. The next step would be analyzing the fine-grained sentiments.  ... 
doi:10.1145/2964284.2971475 dblp:conf/mm/You16 fatcat:iabsyivuezgwbbi6x3ygvvispu

A Natural and Immersive Virtual Interface for the Surgical Safety Checklist Training

Andrea Ferracani, Daniele Pezzatini, Alberto Del Bimbo
2014 Proceedings of the 2014 ACM International Workshop on Serious Games - SeriousGames '14  
Specifically, I will talk about how we have significantly improved image search quality, and built differentiated image search user experience using NLP, entity, big data, machine learning and computer  ...  By leveraging big data from billions of search queries, billions of images on the web and from the social networks, and billions of user clicks, we have designed massive machine learning systems to continuously  ...  Automatic Fine-grained Hyperlinking of Videos Within a Closed Collection Using Scene Segmentation Automatic Maya Hieroglyph Retrieval Using Shape and Context Information A Dataset and Taxonomy for  ... 
doi:10.1145/2656719.2656725 dblp:conf/mm/FerracaniPB14a fatcat:obsb2i4iybhu3dq77hujvjtbze

A Comprehensive Study of Deep Video Action Recognition [article]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
2020 arXiv   pre-print
In this paper, we provide a comprehensive survey of over 200 existing papers on deep learning for video action recognition.  ...  Over the last decade, we have witnessed great advancements in video action recognition thanks to the emergence of deep learning.  ...  Thanks to both the availability of large-scale datasets and the rapid progress in deep learning, there is also a rapid growth in deep learning based models to recognize video actions.  ... 
arXiv:2012.06567v1 fatcat:plqytbfck5bcndiceshix5unpa

Human Action Recognition and Prediction: A Survey [article]

Yu Kong, Yun Fu
2022 arXiv   pre-print
Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state.  ...  Many attempts have been devoted in the last a few decades in order to build a robust and effective framework for action recognition and prediction.  ...  to generalize them to real-world applications due to their inability of training on large-scale datasets.  ... 
arXiv:1806.11230v3 fatcat:2a2d7fuezbdqzfgrjwkcuqvmbu

Deep Audio-visual Learning: A Survey

Hao Zhu, Man-Di Luo, Rui Wang, Ai-Hua Zheng, Ran He
2021 International Journal of Automation and Computing  
Finally, we summarize the commonly used datasets and challenges.  ...  AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully.  ...  Audio-visual event datasets Another audio-visual dataset category consists of music or real-world event videos.  ... 
doi:10.1007/s11633-021-1293-0 fatcat:an5lfyf4m5fh7mlngmdcbx7joy

Category-Based Deep CCA for Fine-Grained Venue Discovery from Multimodal Data [article]

Yi Yu, Suhua Tang, Kiyoharu Aizawa, Akiko Aizawa
2018 arXiv   pre-print
Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, Category-based Deep Canonical Correlation Analysis (C-DCCA).  ...  Experimental results on this dataset confirm the feasibility of the proposed method.  ...  RELATED WORK Generally speaking, fine-grained venue discovery by leveraging heterogeneous social multimodal dataset is a very challenging research topic.  ... 
arXiv:1805.02997v1 fatcat:hoevc7wn4nbbhagxpetgjfny6m

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications [article]

Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
2020 arXiv   pre-print
This review provides a comprehensive analysis of recent works on multimodal deep learning from three perspectives: learning multimodal representations, fusing multimodal signals at various levels, and  ...  Regarding multimodal fusion, this review focuses on special architectures for the integration of representations of unimodal signals for a particular task.  ...  This method transfers parameters from a language model (LM) pre-trained on a large outof-domain dataset using unsupervised training or self-training, which is followed by fine-tuning on small in-domain  ... 
arXiv:1911.03977v3 fatcat:ojazuw3qzvfqrdweul6qdpxuo4

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data.  ...  In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  wide and deep set of related real-world questions.  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

Recent Advances in Video Question Answering: A Review of Datasets and Methods [article]

Devshree Patel, Ratnam Parikh, Yesha Shastri
2021 arXiv   pre-print
VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and datasets for the task of VQA.  ...  Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision.  ...  DEMN (Kim et al., 2017) A novel method based on video-story learning is developed by [8] which uses Deep Embedded Memory Network (DEMN) to reconstruct stories from a joint scene dialogue video stream  ... 
arXiv:2101.05954v1 fatcat:afio7akl7zf6rm2yn2a2xp2anq
« Previous Showing results 1 — 15 out of 875 results