2,348 Hits in 7.3 sec

Identification of Narrative Peaks in Video Clips: Text Features Perform Best [chapter]

Joep J. M. Kierkels, Mohammad Soleymani, Thierry Pun
2010 Lecture Notes in Computer Science  
A methodology is proposed to identify narrative peaks in video clips. Three basic clip properties are evaluated which reflect on video, audio and text related features in the clip.  ...  On the training set, our best detector had an accuracy of 47% in finding narrative peaks. On the test set, this accuracy dropped to 24%.  ...  The fact that this best performing scheme is only based on a text based feature corresponds well to our initial observation that there is no clear audiovisual characteristic of a narrative peak when observing  ... 
doi:10.1007/978-3-642-15751-6_51 fatcat:qqpdspcy5jg7nm3ahopzm4hs5y

Digesting Commercial Clips from TV Streams

Ling-Yu Duan, Jinqiao Wang, Yan-Tao Zheng, Hanqing Lu, Jesse S. Jin
2008 IEEE Multimedia  
Acknowledgment The National Natural Science Foundation of China (Grant No. 60475010) partially supported this work.  ...  We collected these commercial clips from the Text Retrieval Conference (TREC) video-retrieval evaluation 2005 corpus.  ...  The classifier module performs text categorization of proxy articles and determines the categories of respective TV commercials.  ... 
doi:10.1109/mmul.2008.4 fatcat:iguft24u4vhjfdthvzwgyocuoa

Community annotation and remix

Ryan Shaw, Patrick Schmitz
2006 Proceedings of the 1st ACM international workshop on Human-centered multimedia - HCM '06  
Reuse of spoken and written language in source media, and the use of written language in user-defined overlay text segments proved to be essential for most users.  ...  Completed remixes exhibited a range of genres, with over a third showing thematic unity and a quarter showing some attempt at narrative.  ...  Peter Shafton and our research interns for all their work on the platform codebase, to Jeannie Yang for essential organizational support, and Marc Davis for contributing to our initial conceptualization of  ... 
doi:10.1145/1178745.1178761 fatcat:xsjogfcbnfhslarzfqkgsmtw2m

The vision digital video library

Susan Gauch, Wei Li, John Gauch
1997 Information Processing & Management  
Finally, all information is stored in a full-text information retrieval system for content-based exploration of the library over networks of varying bandwidths.  ...  The salient feature of our approach is the integrated application of mature image or video processing, information retrieval, speech feature extraction and word-spotting technologies for efficient creation  ...  Video sequences from CNN Headline News are courtesy of Turner Broadcasting.  ... 
doi:10.1016/s0306-4573(97)00010-1 fatcat:exjxl4hpzbfrfloern5xugehha

Book2Movie: Aligning video scenes with book chapters

Makarand Tapaswi, Martin Bauml, Rainer Stiefelhagen
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Film adaptations of novels often visually display in a few shots what is described in many pages of the source novel. In this paper we present a new problem: to align book chapters with video scenes.  ...  Using the alignment, we present a qualitative analysis of describing the video through rich narratives obtained from the novel.  ...  This figure is best viewed on screen. we highlight and focus on a few key ones: (i) narrative text describing the scene or location; (ii) detailed character descriptions including face-related features  ... 
doi:10.1109/cvpr.2015.7298792 dblp:conf/cvpr/TapaswiBS15 fatcat:4u3s7gk4znf4vn5bvl362kgvdi

Scaling New Peaks: A Viewership-centric Approach to Automated Content Curation [article]

Subhabrata Majumdar, Deirdre Paul, Eric Zavesky
2021 arXiv   pre-print
Summarizing video content is important for video streaming services to engage the user in a limited time span.  ...  We propose a viewership-driven, automated method that accommodates a range of segment identification goals.  ...  The intersection of a V1 clip with a text summary partition corresponded to all tokenized and lemmatized words in a clip annotation being present in the tokenized and lemmatized text under that partition  ... 
arXiv:2108.04187v1 fatcat:vs6ent5l3jdyfjjxkdlsynkz2y

Multimedia content processing through cross-modal association

Dongge Li, Nevenka Dimitrova, Mingkun Li, Ishwar K. Sethi
2003 Proceedings of the eleventh ACM international conference on Multimedia - MULTIMEDIA '03  
Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads.  ...  Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage.  ...  We used different types of video material, including nine home video clips, six movie clips, and three video conferencing clips, all of which are dialog clips with multiple people speaking.  ... 
doi:10.1145/957013.957143 dblp:conf/mm/LiDLS03 fatcat:rs5i65fdmfhkbnxx2ibm5wsbu4

Multimedia content processing through cross-modal association

Dongge Li, Nevenka Dimitrova, Mingkun Li, Ishwar K. Sethi
2003 Proceedings of the eleventh ACM international conference on Multimedia - MULTIMEDIA '03  
Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads.  ...  Existing research in multimodal information analysis has been predominantly focusing on the use of fusion technology.  ...  We used different types of video material, including nine home video clips, six movie clips, and three video conferencing clips, all of which are dialog clips with multiple people speaking.  ... 
doi:10.1145/957142.957143 fatcat:24726hk6cjdhloy5vdp5m52oki

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos, Athanasia Zlatintsi, Alexandros Potamianos, Petros Maragos, Konstantinos Rapantzikos, Georgios Skoumas, Yannis Avrithis
2013 IEEE transactions on multimedia  
Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream.  ...  The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying  ...  ACKNOWLEDGMENT The authors would like to thank the students and staff of CVSP Lab, National Technical University of Athens, Athens, Greece, for participating in the subjective evaluation studies, N.  ... 
doi:10.1109/tmm.2013.2267205 fatcat:jjt7xmjh5narlm5wr2strvrqza

Movie Summarization via Sparse Graph Construction [article]

Pinelopi Papalampidi, Frank Keller, Mirella Lapata
2020 arXiv   pre-print
., key events in a movie that describe its storyline.  ...  We summarize full-length movies by creating shorter videos containing their most informative scenes.  ...  We gratefully acknowledge the support of the European Research Council (Lapata; award 681760, "Translating Multiple Modalities into Text") and of the Leverhulme Trust (Keller; award IAF-2017-019).  ... 
arXiv:2012.07536v1 fatcat:jg2zrhogqbha5ckfiqrudbgwrm

Commonsense Knowledge for the Collection of Ground Truth Data on Semantic Descriptors

Vincenzo Lombardo, Rossana Damiano
2012 2012 IEEE International Symposium on Multimedia  
The coverage of the semantic gap in video indexing and retrieval has gone through a continuous increase of the vocabulary of high-level features or semantic descriptors, sometimes organized in light-scale  ...  We test the viability of the approach by carrying out some user studies on the annotation of narrative videos.  ...  ACKNOWLEDGMENT The work presented here is part of project CADMOS, funded by Regione Piemonte, Innovation Hub for in Multimedia and Digital Creativity, 2010-2012, POR-FESR 07-13.  ... 
doi:10.1109/ism.2012.23 dblp:conf/ism/LombardoD12 fatcat:c6c4utwbxffjdbqoazh6nhxfvi

Not so fast: Limited validity of deep convolutional neural networks as in silico models for human naturalistic face processing [article]

Guo Jiahui, Ma Feilong, Matteo Visconti di Oleggio Castello, Samuel A Nastase, James V Haxby, Maria Ida Gobbini
2021 bioRxiv   pre-print
We developed the largest naturalistic dynamic face stimulus set in human neuroimaging research (700+ naturalistic video clips of unfamiliar faces) and used representational similarity analysis to investigate  ...  Deep convolutional neural networks (DCNNs) trained for face identification can rival and even exceed human-level performance.  ...  Acknowledgements We thank the authors of the InsightFace package for making their models and training data freely available for non-commercial research use.  ... 
doi:10.1101/2021.11.17.469009 fatcat:ewtleq44fjcd5o3ry3zvw6nmbe

Movie Description [article]

Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele
2016 arXiv   pre-print
In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies.  ...  Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description  ...  Marcus Rohrbach was supported by a fellowship within the FITweltweit-Program of the German Academic Exchange Service (DAAD).  ... 
arXiv:1605.03705v1 fatcat:d47amye5lfag7pykmsxyuziolu

Affective Video Summarization and Story Board Generation Using Pupillary Dilation and Eye Gaze

Harish Katti, Karthik Yadati, Mohan Kankanhalli, Chua Tat-Seng
2011 2011 IEEE International Symposium on Multimedia  
The method also includes novel eyegaze analysis and fusion with content based features to discover affective segments of videos and Regions of interest (ROIs) contained therein.  ...  Effectiveness of the framework is evaluated using experiments over a diverse set of clips, significant pool of subjects and comparison with a fully automated state-ofart affective video summarization algorithm  ...  Details of the clips have been summarised in Table I . The original video clips are of 5 mins duration.  ... 
doi:10.1109/ism.2011.57 dblp:conf/ism/KattiYKC11 fatcat:zofyxyonbvc77fwhhvvlm266kq

Modeling, Recognizing, and Explaining Apparent Personality from Videos

Hugo Jair Escalante, Meysam Madadi, Stephane Ayache, Evelyne Viegas, Furkan Gurpinar, Achmadnoer Sukma Wicaksana, Cynthia Liem, Marcel A. J. Van Gerven, Rob Van Lier, Heysem Kaya, Albert Ali Salah, Sergio Escalera (+5 others)
2020 IEEE Transactions on Affective Computing  
To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video.  ...  We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail.  ...  Similarly, 84% of the clips in the test set have at least one clip in the train set which was generated from the same video.  ... 
doi:10.1109/taffc.2020.2973984 fatcat:v3ze54f3pzbsnk44pvridapqdm
« Previous Showing results 1 — 15 out of 2,348 results