Filters








1,892 Hits in 8.4 sec

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

Yitian Yuan, Tao Mei, Wenwu Zhu
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
We evaluate the proposed ABLR approach on two public datasets ActivityNet Captions and TACoS.  ...  We have witnessed the tremendous growth of videos over the Internet, where most of these videos are typically paired with abundant sentence descriptions, such as video titles, captions and comments.  ...  Jun Xu, Linjun Zhou and Xumin Chen for their great supports and valuable suggestions on this work.  ... 
doi:10.1609/aaai.v33i01.33019159 fatcat:t5ckqvm4kre5njg4m6m2uh5z7m

A Comprehensive Review of the Video-to-Text Problem [article]

Jesus Perez-Martin and Benjamin Bustos and Silvio Jamil F. Guimarães and Ivan Sipiran and Jorge Pérez and Grethel Coello Said
2021 arXiv   pre-print
This association can be mainly made by retrieving the most relevant descriptions from a corpus or generating a new one given a context video.  ...  These two ways represent essential tasks for Computer Vision and Natural Language Processing communities, called text retrieval from video task and video captioning/description task.  ...  (VTW) Zeng et al. (2016) proposed this dataset for video title generation.  ... 
arXiv:2103.14785v3 fatcat:xwzziozwjbghfobtowu5bny6bu

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression [article]

Yitian Yuan, Tao Mei, Wenwu Zhu
2018 arXiv   pre-print
Comprehensive experiments on ActivityNet Captions and TACoS datasets demonstrate both the effectiveness and the efficiency of the proposed ABLR approach.  ...  Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details  ...  Jun Xu, Linjun Zhou and Xumin Chen for their great supports and valuable suggestions on this work.  ... 
arXiv:1804.07014v4 fatcat:7ngjxiv3kfgzhfemkf2pxe3pzq

The effect of explanatory captions on the reception of foreign audiovisual products

Binghan Zheng, Mingqing Xie
2018 Translation Cognition & Behavior  
The results show that the provision of ECs, for a subtitled video in a foreign language, greatly increased positive cognitive effects on the viewers.  ...  The present research triangulates questionnaire, retrospective interview and eyetracking data, aiming to investigate how Explanatory Captions (ECs) are received by different viewers with varied educational  ...  In the current research, subtitles are simply texts spoken in the video, which are direct and explicit and thus, in relevance-theoretic terms, more relevant; while ECs are indirect and implicit, and thus  ... 
doi:10.1075/tcb.00006.zhe fatcat:o4c4ctrmqngo5jc445k73pgbhe

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
2021 The Journal of Artificial Intelligence Research  
Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video.  ...  In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained  ...  We also acknowledge the insightful comments of Marius Mosbach on the first version of the manuscript.  ... 
doi:10.1613/jair.1.11688 fatcat:kvfdrg3bwrh35fns4z67adqp6i

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods [article]

Aditya Mogadala and Marimuthu Kalimuthu and Dietrich Klakow
2020 arXiv   pre-print
Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video.  ...  In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulations, methods, existing datasets, evaluation measures, and compare the results obtained  ...  We also acknowledge the insightful comments of Marius Mosbach on the first version of the draft.  ... 
arXiv:1907.09358v2 fatcat:4fyf6kscy5dfbewll3zs7yzsuq

A Survey on the Automatic Indexing of Video Data,

R. Brunelli, O. Mich, C.M. Modena
1999 Journal of Visual Communication and Image Representation  
Today a considerable amount of video data in multimedia databases requires sophisticated indices for its effective use.  ...  This paper surveys several approaches and algorithms that have been recently proposed to automatically structure audio-visual data, both for annotation and access. C 1999 Academic Press  ...  ., they provide explicit indices, such as words and sentences that can be directly used to categorize and access them), digital video documents do not provide such an explicit content description.  ... 
doi:10.1006/jvci.1997.0404 fatcat:hgvh6xcpgvgl5nwvoddsblky5i

Broadcast News Gisting Using Lexical Cohesion Analysis [chapter]

Nicola Stokes, Eamonn Newman, Joe Carthy, Alan F. Smeaton
2004 Lecture Notes in Computer Science  
The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital multimedia systems.  ...  We automatically evaluate the performance of our lexical chaining-based gister with respect to four baseline extractive gisting methods on a collection of closed caption material taken from a series of  ...  In contrast, the readability of a generated title is dependant on a 'title word ordering phrase' [3] , which is based on statistical probabilities rather than any explicit consideration of grammatical  ... 
doi:10.1007/978-3-540-24752-4_16 fatcat:owih6mdorvg7tcjkc2kgr3jv4q

Extracting relational facts for indexing and retrieval of crime-scene photographs

Katerina Pastra, Horacio Saggion, Yorick Wilks
2003 Knowledge-Based Systems  
This paper presents work on text-based photograph indexing and retrieval for crime investigation, an application domain where efficient querying of large crime-scene photograph databases is of crucial  ...  The extraction of these semantic triples is based on advanced knowledge-based Natural Language Processing technologies and resources. q  ...  with information on the crime scene documentation practices and in collecting all our caption corpus.  ... 
doi:10.1016/s0950-7051(03)00033-9 fatcat:se5gpdiiu5c5zitzhwxfovzsue

Extracting Relational Facts for Indexing and Retrieval of Crime-Scene Photographs [chapter]

Katerina Pastra, Horacio Saggion, Yorick Wilks
2003 Applications and Innovations in Intelligent Systems X  
This paper presents work on text-based photograph indexing and retrieval for crime investigation, an application domain where efficient querying of large crime-scene photograph databases is of crucial  ...  The extraction of these semantic triples is based on advanced knowledge-based Natural Language Processing technologies and resources. q  ...  with information on the crime scene documentation practices and in collecting all our caption corpus.  ... 
doi:10.1007/978-1-4471-0649-4_9 fatcat:dfl2kduptvbgxjdpaqjjur6lfu

From image to language and back again

A. BELZ, T.L. BERG, L. YU
2018 Natural Language Engineering  
generation (Madhyastha et al., Tanti et al.), visual scene understanding (Silberer et al.), and multimodal learning of high-level attributes (Sorodoc et al.).  ...  In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation  ...  Allen Family Foundation, ICTAS Junior Faculty awards to DB and DP, Google Faculty Research Awards to DP and DB, AWS in Education Research grant to DB, and NVIDIA GPU donations to DB.  ... 
doi:10.1017/s1351324918000086 fatcat:fvxkgjlolra4vns2r5qx4xvg3i

D2.2 Implementations of methods adapted to enhanced human inputs

Doukhan, Francis, Harrando, Huet, Kaseva, Kurimo, Laaksonen, Lindh-Knuutila, Lisena, Pehlivan Tort, Reboud, Rouhe (+2 others)
2020 Zenodo  
Based on the methods' primary input domain, they have been grouped as visual (facial person recognition, facial gender classification and video description), auditory (speech and gender segmentation, speech  ...  recognition and speaker identification and diarisation) and multimodal (audio-enhanced captioning, visual–auditory gender classification, person re-identification and multimodal speech recognition) approaches  ...  We follow the standard protocol provided by the authors [23] by using 80 images per scene category for training and another 20 images for testing.  ... 
doi:10.5281/zenodo.4964298 fatcat:6bbqa7q3xrctnm6nrf5fxh7f3q

A Scientometric Visualization Analysis of Image Captioning Research from 2010 to 2020

Wenxuan Liu, Huayi Wu, Kai Hu, Qing Luo, Xiaoqiang Cheng
2021 IEEE Access  
Meanwhile, we find that evaluation methods, datasets, novel image captioning models based on generative adversarial networks, reinforcement learning, and Transformer, as well as remote sensing image captioning  ...  It is thus very important to keep up with the latest research and results in the field of image captioning whereas publications on this topic are numerous.  ...  that they can promote the further development and use of image captioning technology.  ... 
doi:10.1109/access.2021.3129782 fatcat:7uihhdaoabb5facidfohc7uslu

Learning in student-generated T.V. commercials enhanced by computer technology

Joyce Cunningham
2011 The JALT CALL Journal  
This paper will focus on a collaborative project which incorporates filming and video-editing, and will examine student perceptions about the use of video production from feedback gained in interviews  ...  Techniques include brainstorming, storyboarding, making dialogues, filming, video editing (iMovie2), and portfolio assessment, etc. the  ...  Among others, NBC Network's collection of commercials, on chewing gum, pizza, bagels, Hershey's chocolate and so on offers more practice in analyzing techniques of persuasion, implicit and explicit messages  ... 
doi:10.29140/jaltcall.v7n2.114 fatcat:ndcstmur5fglbeleaxf6bc42qi

The Essay as Mode of Expression and the Essayistic Practices in Radu Jude's Cinema

Pop Doru, Babeş-Bolyai University
2021 Ekphrasis: Images, Cinema, Theory, Media  
those involved, is based on a real story.  ...  Here the Berlin-based director plays with the German word Schnitt, which in German can mean both to section or to separate and to connect.  ... 
doi:10.24193/ekphrasis.26.12 fatcat:fd2xygjit5d4jdxynay6my2gca
« Previous Showing results 1 — 15 out of 1,892 results