Filters








88 Hits in 5.2 sec

Weakly Supervised Construction of ASR Systems with Massive Video Data [article]

Mengli Cheng, Chengyu Wang, Xu Hu, Jun Huang, Xiaobo Wang
2020 arXiv   pre-print
In this paper, we present a weakly supervised framework for constructing ASR systems with massive video data.  ...  Building Automatic Speech Recognition (ASR) systems from scratch is significantly challenging, mostly due to the time-consuming and financially-expensive process of annotating a large amount of audio data  ...  Here, we present a weakly supervised framework to construct ASR systems from massive video data, shown in Figure 1 .  ... 
arXiv:2008.01300v2 fatcat:erqimwin4ff55nnco3na4npry4

Weakly Supervised Construction of ASR Systems from Massive Video Data

Mengli Cheng, Chengyu Wang, Jun Huang, Xiaobo Wang
2021 Conference of the International Speech Communication Association  
In this paper, we present VideoASR, a weakly supervised framework for constructing ASR systems from massive video data.  ...  due to the time-consuming and financiallyexpensive process of annotating a large amount of audio data with transcripts.  ...  In this work, we present a weakly supervised framework to construct ASR # Equal contribution. systems from massive video data, named VideoASR. 1 The framework is shown in Figure 1 , which consists of  ... 
doi:10.21437/interspeech.2021-7 dblp:conf/interspeech/Cheng00W21 fatcat:xez4ixvqazewnchzj7mialezvm

Learning To Recognize Procedural Activities with Distant Supervision [article]

Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani
2022 arXiv   pre-print
To address this issue, we propose to automatically identify steps in instructional videos by leveraging the distant supervision of a textual knowledge base (wikiHow) that includes detailed descriptions  ...  to the prohibitive cost of manually annotating temporal boundaries in long videos.  ...  The downside of this massive amount of data is that its scale effectively prevents manual annotation. In fact, all videos in HowTo100M are unverified by human annotators.  ... 
arXiv:2201.10990v3 fatcat:ghjybqtitjf5thlsgoknmpgf2a

An analytical study of information extraction from unstructured and multidimensional big data

Kiran Adnan, Rehan Akbar
2019 Journal of Big Data  
Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems.  ...  "Audio IE" section presents the detailed discussion on IE from audio, its subtasks such as AED and ASR with state-ofthe-art techniques and challenges.  ...  In this context, CNN based weakly supervised technique was compared to the technique trained with fully-supervised data.  ... 
doi:10.1186/s40537-019-0254-8 fatcat:qy5l55um7feeblec4hxohr3pqa

Evaluating Multimedia Features and Fusion for Example-Based Event Detection [chapter]

Gregory K. Myers, Cees G. M. Snoek, Ramakant Nevatia, Ramesh Nallapati, Julien van Hout, Stephanie Pancoast, Chen Sun, Amirhossein Habibian, Dennis C. Koelma, Koen E. A. van de Sande, Arnold W. M. Smeulders
2014 Fusion in Computer Vision  
To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia  ...  Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA  ... 
doi:10.1007/978-3-319-05696-8_5 dblp:series/acvpr/MyersSNNHPSHKSS14 fatcat:c72ejcp2wzgd7bvis5bucvxmai

Evaluating multimedia features and fusion for example-based event detection

Gregory K. Myers, Ramesh Nallapati, Julien van Hout, Stephanie Pancoast, Ramakant Nevatia, Chen Sun, Amirhossein Habibian, Dennis C. Koelma, Koen E. A. van de Sande, Arnold W. M. Smeulders, Cees G. M. Snoek
2013 Machine Vision and Applications  
To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia  ...  Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos.  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA  ... 
doi:10.1007/s00138-013-0527-8 fatcat:mx2lazcfhndkfme5p4nqhlrinu

A Survey of Code-switched Speech and Language Processing [article]

Sunayana Sitaram, Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Alan W Black
2020 arXiv   pre-print
As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for.  ...  We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities.  ...  [119] decode untranscribed data with this ASR system and add the decoded speech to ASR training data after rescoring using Language Models.  ... 
arXiv:1904.00784v3 fatcat:r5tsg4kdnfbtnndae523c32pta

Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things [article]

Jing Zhang, Dacheng Tao
2020 arXiv   pre-print
However, transmitting massive amounts of heterogeneous data, perceiving complex environments from these data, and then making smart decisions in a timely manner are difficult.  ...  In the Internet of Things (IoT) era, billions of sensors and devices collect and process data from the environment, transmit them to cloud centers, and receive feedback via the internet for connectivity  ...  massive amounts of data.  ... 
arXiv:2011.08612v1 fatcat:dflut2wdrjb4xojll34c7daol4

Transcript to Video: Efficient Clip Sequencing from Texts [article]

Yu Xiong, Fabian Caba Heilbron, Dahua Lin
2021 arXiv   pre-print
To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots  ...  Quantitative results and user studies demonstrate empirically that the proposed learning framework can retrieve content-relevant shots while creating plausible video sequences in terms of style.  ...  To tackle these challenges, we propose a weakly-supervised framework for learning vision-language embeddings and sequencing styles on a newly constructed unlabeled video dataset.  ... 
arXiv:2107.11851v1 fatcat:vfcx7w75kzgg7ppurgswceoi5i

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability  ...  Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields.  ...  Semi/Weakly-supervised Pretraining Semi/Weakly supervised pre-training is aimed at training models with fewer or weaker human annotations.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

A Review on MAS-Based Sentiment and Stress Analysis User-Guiding and Risk-Prevention Systems in Social Network Analysis

Guillem Aguado, Vicente Julián, Ana García-Fornes, Agustín Espinosa
2020 Applied Sciences  
For this reason, in this survey we explore works in the line of prevention of risks that can arise from social interaction in online environments, focusing on works using Multi-Agent System (MAS) technologies  ...  We review with special attention works using MAS technologies for user recommendation and guiding.  ...  For multi-aspect rating prediction with indirect supervision: LDA, MG-LDA, STM, and local LDA weakly supervised with seed words are used to label sentences with aspects, and a Support Vector Regression  ... 
doi:10.3390/app10196746 fatcat:m2gqf3utabgtrcvhtbh53hksfq

Deep Learning in Mobile and Wireless Networking: A Survey [article]

Chaoyun Zhang, Paul Patras, Hamed Haddadi
2019 arXiv   pre-print
We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking.  ...  We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems.  ...  Serving Deep Learning with Massive High-Quality Data Deep neural networks rely on massive and high-quality data to achieve good performance.  ... 
arXiv:1803.04311v3 fatcat:awuvyviarvbr5kd5ilqndpfsde

Deep Learning in Mobile and Wireless Networking: A Survey

Chaoyun Zhang, Paul Patras, Hamed Haddadi
2019 IEEE Communications Surveys and Tutorials  
We first briefly introduce essential background and state-of-theart in deep learning techniques with potential applications to networking.  ...  We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems.  ...  Serving Deep Learning with Massive High-Quality Data Deep neural networks rely on massive and high-quality data to achieve good performance.  ... 
doi:10.1109/comst.2019.2904897 fatcat:xmmrndjbsfdetpa5ef5e3v4xda

From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video [article]

Junwei Liang
2021 arXiv   pre-print
With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving,  ...  Many systems do not provide high-level semantic attributes to reason about pedestrian future. This design hinders prediction performance in video data from diverse domains and unseen scenarios.  ...  Therefore we rst study weakly-supervised learning with massive video data with weak labels from Internet platforms like YouTube (chapter 3).  ... 
arXiv:2011.10670v3 fatcat:mlom5zqk6jdvjndcsfwimpj7xu

Video Skimming

Vivekraj V. K., Debashis Sen, Balasubramanian Raman
2019 ACM Computing Surveys  
Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video.  ...  Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary.  ...  The authors would like to thank all the reviewers' for their insightful comments through which the quality of this work has been enhanced.  ... 
doi:10.1145/3347712 fatcat:h4zbzmdfx5c2rm3dm4cmmzrsoa
« Previous Showing results 1 — 15 out of 88 results