A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Weakly Supervised Construction of ASR Systems with Massive Video Data
[article]
2020
arXiv
pre-print
In this paper, we present a weakly supervised framework for constructing ASR systems with massive video data. ...
Building Automatic Speech Recognition (ASR) systems from scratch is significantly challenging, mostly due to the time-consuming and financially-expensive process of annotating a large amount of audio data ...
Here, we present a weakly supervised framework to construct ASR systems from massive video data, shown in Figure 1 . ...
arXiv:2008.01300v2
fatcat:erqimwin4ff55nnco3na4npry4
Weakly Supervised Construction of ASR Systems from Massive Video Data
2021
Conference of the International Speech Communication Association
In this paper, we present VideoASR, a weakly supervised framework for constructing ASR systems from massive video data. ...
due to the time-consuming and financiallyexpensive process of annotating a large amount of audio data with transcripts. ...
In this work, we present a weakly supervised framework to construct ASR # Equal contribution. systems from massive video data, named VideoASR. 1 The framework is shown in Figure 1 , which consists of ...
doi:10.21437/interspeech.2021-7
dblp:conf/interspeech/Cheng00W21
fatcat:xez4ixvqazewnchzj7mialezvm
Learning To Recognize Procedural Activities with Distant Supervision
[article]
2022
arXiv
pre-print
To address this issue, we propose to automatically identify steps in instructional videos by leveraging the distant supervision of a textual knowledge base (wikiHow) that includes detailed descriptions ...
to the prohibitive cost of manually annotating temporal boundaries in long videos. ...
The downside of this massive amount of data is that its scale effectively prevents manual annotation. In fact, all videos in HowTo100M are unverified by human annotators. ...
arXiv:2201.10990v3
fatcat:ghjybqtitjf5thlsgoknmpgf2a
An analytical study of information extraction from unstructured and multidimensional big data
2019
Journal of Big Data
Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. ...
"Audio IE" section presents the detailed discussion on IE from audio, its subtasks such as AED and ASR with state-ofthe-art techniques and challenges. ...
In this context, CNN based weakly supervised technique was compared to the technique trained with fully-supervised data. ...
doi:10.1186/s40537-019-0254-8
fatcat:qy5l55um7feeblec4hxohr3pqa
Evaluating Multimedia Features and Fusion for Example-Based Event Detection
[chapter]
2014
Fusion in Computer Vision
To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia ...
Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. ...
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA ...
doi:10.1007/978-3-319-05696-8_5
dblp:series/acvpr/MyersSNNHPSHKSS14
fatcat:c72ejcp2wzgd7bvis5bucvxmai
Evaluating multimedia features and fusion for example-based event detection
2013
Machine Vision and Applications
To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia ...
Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. ...
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA ...
doi:10.1007/s00138-013-0527-8
fatcat:mx2lazcfhndkfme5p4nqhlrinu
A Survey of Code-switched Speech and Language Processing
[article]
2020
arXiv
pre-print
As code-switching data and resources are scarce, we list what is available in various code-switched language pairs with the language processing tasks they can be used for. ...
We motivate why processing code-switched text and speech is essential for building intelligent agents and systems that interact with users in multilingual communities. ...
[119] decode untranscribed data with this ASR system and add the decoded speech to ASR training data after rescoring using Language Models. ...
arXiv:1904.00784v3
fatcat:r5tsg4kdnfbtnndae523c32pta
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things
[article]
2020
arXiv
pre-print
However, transmitting massive amounts of heterogeneous data, perceiving complex environments from these data, and then making smart decisions in a timely manner are difficult. ...
In the Internet of Things (IoT) era, billions of sensors and devices collect and process data from the environment, transmit them to cloud centers, and receive feedback via the internet for connectivity ...
massive amounts of data. ...
arXiv:2011.08612v1
fatcat:dflut2wdrjb4xojll34c7daol4
Transcript to Video: Efficient Clip Sequencing from Texts
[article]
2021
arXiv
pre-print
To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots ...
Quantitative results and user studies demonstrate empirically that the proposed learning framework can retrieve content-relevant shots while creating plausible video sequences in terms of style. ...
To tackle these challenges, we propose a weakly-supervised framework for learning vision-language embeddings and sequencing styles on a newly constructed unlabeled video dataset. ...
arXiv:2107.11851v1
fatcat:vfcx7w75kzgg7ppurgswceoi5i
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability ...
Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. ...
Semi/Weakly-supervised Pretraining Semi/Weakly supervised pre-training is aimed at training models with fewer or weaker human annotations. ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4
A Review on MAS-Based Sentiment and Stress Analysis User-Guiding and Risk-Prevention Systems in Social Network Analysis
2020
Applied Sciences
For this reason, in this survey we explore works in the line of prevention of risks that can arise from social interaction in online environments, focusing on works using Multi-Agent System (MAS) technologies ...
We review with special attention works using MAS technologies for user recommendation and guiding. ...
For multi-aspect rating prediction with indirect supervision: LDA, MG-LDA, STM, and local LDA weakly supervised with seed words are used to label sentences with aspects, and a Support Vector Regression ...
doi:10.3390/app10196746
fatcat:m2gqf3utabgtrcvhtbh53hksfq
Deep Learning in Mobile and Wireless Networking: A Survey
[article]
2019
arXiv
pre-print
We first briefly introduce essential background and state-of-the-art in deep learning techniques with potential applications to networking. ...
We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. ...
Serving Deep Learning with Massive High-Quality Data Deep neural networks rely on massive and high-quality data to achieve good performance. ...
arXiv:1803.04311v3
fatcat:awuvyviarvbr5kd5ilqndpfsde
Deep Learning in Mobile and Wireless Networking: A Survey
2019
IEEE Communications Surveys and Tutorials
We first briefly introduce essential background and state-of-theart in deep learning techniques with potential applications to networking. ...
We then discuss several techniques and platforms that facilitate the efficient deployment of deep learning onto mobile systems. ...
Serving Deep Learning with Massive High-Quality Data Deep neural networks rely on massive and high-quality data to achieve good performance. ...
doi:10.1109/comst.2019.2904897
fatcat:xmmrndjbsfdetpa5ef5e3v4xda
From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video
[article]
2021
arXiv
pre-print
With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving, ...
Many systems do not provide high-level semantic attributes to reason about pedestrian future. This design hinders prediction performance in video data from diverse domains and unseen scenarios. ...
Therefore we rst study weakly-supervised learning with massive video data with weak labels from Internet platforms like YouTube (chapter 3). ...
arXiv:2011.10670v3
fatcat:mlom5zqk6jdvjndcsfwimpj7xu
Video Skimming
2019
ACM Computing Surveys
Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. ...
Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. ...
The authors would like to thank all the reviewers' for their insightful comments through which the quality of this work has been enhanced. ...
doi:10.1145/3347712
fatcat:h4zbzmdfx5c2rm3dm4cmmzrsoa
« Previous
Showing results 1 — 15 out of 88 results