1,950 Hits in 7.0 sec

Multiple-view constrained clustering for unsupervised face identification in TV-broadcast

Meriem Bendris, Benoit Favre, Delphine Charlet, Geraldine Damnati, Remi Auguste
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Our goal is to automatically identify faces in TV broadcast without a pre-defined dictionary of identities.  ...  In TV content, people appear with many variations making the clustering difficult. In this case, speaker clustering can be a reliable link for face clustering.  ...  Overlaid Person Names In TV broadcast, most of Overlaid Person Names (OPN) occur while the corresponding face appears talking.  ... 
doi:10.1109/icassp.2014.6853645 dblp:conf/icassp/BendrisFCDA14 fatcat:ju5yza3vjzb6be6vxny367bdky

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast [chapter]

Hervé Bredin, Johann Poignant, Makarand Tapaswi, Guillaume Fortier, Viet Bac Le, Thibault Napoleon, Hua Gao, Claude Barras, Sophie Rosset, Laurent Besacier, Jakob Verbeek, Georges Quénot (+2 others)
2012 Lecture Notes in Computer Science  
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast.  ...  In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run.  ...  same goal: multimodal person recognition in TV broadcast.  ... 
doi:10.1007/978-3-642-33885-4_39 fatcat:k4tsmhlqgbhb3n276byfpjaewm

Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast

Hervé Bredin, Anindya Roy, Viet-Bac Le, Claude Barras
2014 International Journal of Multimedia Information Retrieval  
This work introduces a unified framework for mono-, cross-and multi-modal person recognition in multimedia data.  ...  Practically, we describe how the approach can be applied to speaker identification in TV broadcast. Then, a solution to the above-mentioned mapping problem is proposed.  ...  Thanks to Johann Poignant for providing the output of video OCR.  ... 
doi:10.1007/s13735-014-0055-y fatcat:mlvyk5h5v5c4pmo4nvvljqs5ga

Scene understanding for identifying persons in TV shows: Beyond face authentication

Mickael Rouvier, Benoit Favre, Meriem Bendris, Delphine Charlet, Geraldine Damnati
2014 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI)  
Our goal is to automatically identify people in TV news and debates without any predefined dictionary of people.  ...  Then, people are identified by propagation strategies of overlaid names (OCR results) and speakers to scene classes and specific camera shots.  ...  Face detection and clustering In TV-broadcast, clustering faces is difficult because of the variability of face appearance.  ... 
doi:10.1109/cbmi.2014.6849829 dblp:conf/cbmi/RouvierFBCD14 fatcat:m7zmsup6f5d4hemhpxvhj3t4ii

Unsupervised Speaker Identification in TV Broadcast Based on Written Names

Johann Poignant, Laurent Besacier, Georges Quenot
2014 IEEE/ACM Transactions on Audio Speech and Language Processing  
We first compared these two sources of names on their abilities to provide the name of the speakers in TV broadcast.  ...  Identifying speakers in TV broadcast in an unsupervised way (i.e. without biometric models) is a solution for avoiding costly annotations.  ...  We therefore propose, in section IV, different association methods using written names to identify speakers in TV broadcast. III.  ... 
doi:10.1109/taslp.2014.2367822 fatcat:m2kjw33qwfetdfeny3zwh67lva

A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video

Jehanzeb Abbas, Charlie K. Dagli, Thomas S. Huang
2007 2007 IEEE Conference on Computer Vision and Pattern Recognition  
We propose a complete solution to full modality personprofiling for speakers and submodality person-profiling for non-speakers in real-world videos.  ...  In addition we are also interested in only name and face correspondence database for non-speakers who appear during voice-overs.  ...  Acknowledgements This work is supported in part by the Fulbright Fellowship.  ... 
doi:10.1109/cvpr.2007.383493 dblp:conf/cvpr/AbbasDH07 fatcat:otqkoj3xxbcc7ddrwablqmlzqi

Guest Editorial: Content-based Multimedia Indexing

Harald Kosch, Georges Quénot
2016 Multimedia tools and applications  
We are also deeply thankful to the many reviewers who did a great job in performing thorough reviews in several review rounds in a timely manner. We hope the readers will enjoy this special issue.  ...  Acknowledgments We would like to thank all authors who submitted their work to this special issue and worked very hard to provide interesting contributions to the field of content-based multimedia indexing  ...  The second paper BNaming multi-modal clusters to identify persons in TV Broadcast^(DOI 10.1007/s11042-015-2723-1), co-authored by Johann Poignant, Guillaume Fortier, Laurent Besacier and Georges Quénot  ... 
doi:10.1007/s11042-016-3683-9 fatcat:yvz3hnw3ibgkbpy6bdtjmoqp44

Multi-modal information fusion for news story segmentation in broadcast video

Bailan Feng, Peng Ding, Jiansong Chen, Jinfeng Bai, Su Xu, Bo Xu
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we propose a novel news story segmentation scheme which can segment broadcast video into story units with multi-modal information fusion (MMIF) strategy.  ...  Parallel to this, we make use of a multi-modal information fusion strategy for news story boundary characterization by joining these visual, audio and textual cues.  ...  Motivated by these analyses, in this paper, we propose a novel news story segmentation scheme under model-based method, named multi-modal information fusion (MMIF) for news story segmentation in broadcast  ... 
doi:10.1109/icassp.2012.6288156 dblp:conf/icassp/FengDCBXX12 fatcat:g5g7fohlhre7fetvclum5ybsje

A conditional random field approach for face identification in broadcast news using overlaid text

Gay Paul, Khoury Elie, Meignier Sylvain, Odobez Jean-Marc, Deleglise Paul
2014 2014 IEEE International Conference on Image Processing (ICIP)  
We investigate the problem of face identification in broadcast programs where people names are obtained from text overlays automatically processed with Optical Character Recognition (OCR) and further linked  ...  clusters that improves identification performance thanks to the use of further diarization statistics.  ...  To favour the face tracks identified as recurrent LF B to join a person cluster which could be named.  ... 
doi:10.1109/icip.2014.7025063 dblp:conf/icip/GayKMOD14 fatcat:7vkqdfq27reapnaca63h3noqo4

Towards large scale multimedia indexing

Nam Le, Izabela Lyon Freire, Zenilton Patrocínio, Silvio Jamil F. Guimarães, Gerard Martí, Josep Ramon Morros, Javier Hernando, Laura Docio-Fernandez, Carmen Garcia-Mateo, Sylvain Meignier, Jean-Marc Odobez, Hervé Bredin (+7 others)
2017 Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing - CBMI '17  
Thus in this paper, we aim to investigate these approaches with variations in their components using the medium scale multimedia dataset associated to the "Multimodal Person Discovery in Broadcast TV"  ...  The only way to identify a person is by finding their name n ∈ N in the audio (e.g. using ASR) or visual (e.g. using OCR) streams and associating them to the correct person (Fig. 1) .  ... 
doi:10.1145/3095713.3095732 dblp:conf/cbmi/LeBSILBGGFFPGMM17 fatcat:rbesmnsd25acvewg5ltg7r3gj4

Automated Video Labelling: Identifying Faces by Corroborative Evidence [article]

Andrew Brown, Ernesto Coto, Andrew Zisserman
2021 arXiv   pre-print
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio).  ...  and test settings, such as TV shows and news broadcasts.  ...  We are grateful to Arhsa Nagrani, Shaya Ghadimi, and Maya Gulieva for proof reading, and the reviewers for their helpful feedback.  ... 
arXiv:2102.05645v1 fatcat:ky4gbobjvzdkrjedfnzvwu7zwm

A conditional random field approach for audio-visual people diarization

Gay Paul, Khoury Elie, Meignier Sylvain, Odobez Jean-Marc, Deleglise Paul
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Experiments on 6 hours of broadcast data show that our framework is able to improve the AV-person diarization especially for speaker segments erroneously labeled in the mono-modal case.  ...  We investigate the problem of audio-visual (AV) person diarization in broadcast data.  ...  Authors in [9] use Markov random field to combine audio and video classifiers for identifying people in TV-Series while [10] uses Condi-tional Random Fields (CRF) to integrate various cues in a face  ... 
doi:10.1109/icassp.2014.6853569 dblp:conf/icassp/GayKMOD14 fatcat:r3dwb3xhprbo3iem7bpez6faum

Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery

Gabriel Barbosa da Fonseca, Izabela Lyon Freire, Zenilton Patrocínio, Silvio Jamil F. Guimarães, Gabriel Sargent, Ronan Sicre, Guillaume Gravier
2017 Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing - CBMI '17  
The indexing of broadcast TV archives is a current problem in multimedia research.  ...  In this context, this paper focuses on two approaches for unsupervised person discovery.  ...  Therefore, the only way to identify persons is by extracting their names from audio and visual streams (e.g., using speech transcription or OCR) and associating them to the correct person, thus making  ... 
doi:10.1145/3095713.3095729 dblp:conf/cbmi/FonsecaFPGSSG17 fatcat:uadnqucndjgbpcxmhzn5eexb2a

Eigennews: Generating and delivering personalized news video

M. Daneshi, P. Vajda, D. M. Chen, S. S. Tsai, M. C. Yu, A. F. Araujo, H. Chen, B. Girod
2013 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)  
The EigenNews system provides each viewer with a personalized newscast filled with stories that matter most to them.  ...  Over time, the website collects click and view histories and uses this information to continuously optimize the personalization.  ...  After clustering these facial CHs over the entire newscast, dominant clusters are identified corresponding to the anchorperson(s) in the program.  ... 
doi:10.1109/icmew.2013.6618439 dblp:conf/icmcs/DaneshiVCTYACG13 fatcat:akdjizsq5nbc5nopnjm2swnwh4

A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval

Alberto Messina, Maurizio Montagnuolo
2009 Proceedings of the 18th international conference on World wide web - WWW '09  
The prototype adopts online newspaper articles and TV newscasts as information sources, to deliver a service made up of items including both contributions.  ...  In addition, the availability of the same content in the form of digital multimedia data has dramatically increased.  ...  Therefore, tools to integrate multi-media data from mono-modal information sources were investigated. A method for querying persons in Yahoo!  ... 
doi:10.1145/1526709.1526753 dblp:conf/www/MessinaM09 fatcat:4tdbihs6ofhoznkngjkkavbusq
« Previous Showing results 1 — 15 out of 1,950 results