9 Hits in 7.5 sec

CMU Spoken Document Retrieval in Trec-8: Analysis of the role of Term Frequency TF

Matthew Siegler, Rong Jin, Alexander G. Hauptmann
1999 Text Retrieval Conference  
The participation of Carnegie Mellon University in the TREC-8 Spoken Document Retrieval Track used the basic same Sphinx speech recognition system as in TREC-7.  ...  A thorough examination of the speech recognition condition is given in [3] .  ...  As many people pointed out in previous TRECs, directly multiplying term frequency tf with inverse document frequency idf generally causes poor performance.  ... 
dblp:conf/trec/SieglerJH99 fatcat:pqimcgx3dzdfxh7e6gzfoef2le

Video retrieval using speech and image information

Alexander G. Hauptmann, Rong Jin, Tobun D. Ng, Minerva M. Yeung, Rainer W. Lienhart, Chung-Sheng Li
2003 Storage and Retrieval for Media Databases 2003  
For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.  ...  Retrieval Track evaluation performed by the National Institute of Standards and Technology.  ...  IRI-9817496, and by the Advanced Research and Development Activity (ARDA) under contract number MDA908-00-C-0037.  ... 
doi:10.1117/12.479747 dblp:conf/spieSR/HauptmannJN03 fatcat:tusllilhvvdifmoevwextb3bke

Searching spontaneous conversational speech

Franciska de Jong, Douglas W. Oard, Roeland Ordelman, Stephan Raaijmakers
2007 SIGIR Forum  
Preface Nearly a decade ago, we learned from the TREC Spoken Document Retrieval (SDR) track that searching speech was a "solved problem."  ...  Three factors were key to this success: • Broadcast news has a "story" structure that resembles written documents. • The redundancy present in human language meant that search effectiveness held up well  ...  Any opinions, ndings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reect the view of DARPA.  ... 
doi:10.1145/1328964.1328982 fatcat:wwpzqq7ndrfedh4imhoznvccl4

Automatic Summarization

Martha Larson
2012 Foundations and Trends in Information Retrieval  
Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR).  ...  SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and  ...  The large corpus of broadcast news used for the TREC Spoken Document Retrieval track in 1999 and 2000 (i.e., SDR at TREC-8 and TREC-9) was recognized with WERs that ranged from 10% to 20%.  ... 
doi:10.1561/1500000020 fatcat:o424mjxnp5abbexhjsobtom2ry

BASIL: Effective Near-Duplicate Image Detection Using Gene Sequence Alignment [chapter]

Hung-sik Kim, Hau-Wen Chang, Jeongkyu Lee, Dongwon Lee
2010 Lecture Notes in Computer Science  
In the dominance of social networks era, vast information is created and shared across the world each day.  ...  We also propose a heuristics-based method to extract n-gram keyphrases from noisy textual content by taking the importance of sub-term keywords into consideration. iii  ...  Let d be a document in a corpora D, t is a term in d, whose frequency is tf t .  ... 
doi:10.1007/978-3-642-12275-0_22 fatcat:ou4wo4a6efdabkipzbkaxd5cyi

Statistical source expansion for question answering

Nico Schlaefer, Jennifer Chu-Carroll, Eric Nyberg, James Fan, Wlodek Zadrozny, David Ferrucci
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
of the retrieved text.  ...  In this thesis, we propose a novel algorithm that expands a collection of seed documents by (1) retrieving related content from the Web or other large external sources, (2) extracting self-contained text  ...  The term frequencies (tf) are estimated from the seed, the inverse document frequencies (idf) from the same sample of Wikipedia articles used for the TopicRatioSeed feature.  ... 
doi:10.1145/2063576.2063632 dblp:conf/cikm/SchlaeferCNFZF11 fatcat:whoy62klazctbdo4p57wbevkdu

Variations on language modeling for information retrieval

Wessel Kraaij
2005 SIGIR Forum  
Variations on Language Modeling for Information Retrieval W. Kraaij -Enschede: Neslia Paniculata. Thesis Enschede -With ref. With summary ISBN 90-75296-09-6  ...  A first step to improve retrieval effectiveness was to include the frequency of occurrence of an index term in a document in the ranking formula, usually referred to as term frequency tf.  ...  d number of unique terms in a document tf term frequency the (relative) relevance of a document by applying a ranking function, which produces a partial ordering of the documents.  ... 
doi:10.1145/1067268.1067291 fatcat:h23lp5aqfvfu5iecwnihfme244

The THISL spoken document retrieval project

S. Renals
Proceedings IEEE International Conference on Multimedia Computing and Systems  
In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on Okapi term-weighting .  ...  The system has been evaluated as part of the TREC-6 and TREC-7 spoken document retrieval evaluations and we report on the results of the TREC-7 evaluation based on a document collection of 100 hours of  ...  This work has benefited from collaboration with the partners of the THISL and SPRACH projects, in particular Tony Robinson (Cambridge University and SoftSound) and Gary Cook (Cambridge University).  ... 
doi:10.1109/mmcs.1999.778655 dblp:conf/icmcs/Renals99 fatcat:w2kj4jma2vdkfgx5hr3xlj6htm

Investigating different models for cross-language information retrieval from automatic speech transcripts

Muath Alzghool, Université D'Ottawa / University Of Ottawa, Université D'Ottawa / University Of Ottawa
Early research considered spoken document retrieval for broadcast news as a "solved problem" [1].  ...  However, since ASR is an imperfect process, often there are spoken words that are not recognized correctly. This will lead to word mismatches in the retrieval.  ...  , and TREC-8 [1] .  ... 
doi:10.20381/ruor-13199 fatcat:ksd2cgasnnf4db5k2pvv22yyyy