1,578 Hits in 3.8 sec

Unsupervised clustering of spontaneous speech documents

Edgar Gonzàlez, Jordi Turmo
2005 Interspeech 2005   unpublished
This paper presents an unsupervised method for clustering spontaneous speech documents.  ...  We have evaluated this method on the Switchboard corpus and compared it to a set of supervised and other unsupervised methods.  ...  Acknowledgements This work has been partially funded by the European CHIL Project (IP-506909), the Departament d'Universitats, Recerca i Societat de la Informació and the Spanish Ministry of Science and  ... 
doi:10.21437/interspeech.2005-63 fatcat:qbha4nvrhzae3hbapxpuprbvti

A latent semantic retrieval and clustering system for personal photos with sparse speech annotation

Yi-Sheng Fu, Winston H. Hsu, Lin-Shan Lee
2009 Proceedings of the third workshop on Searching spontaneous conversational speech - SSCS '09  
In this demo we present a user-friendly latent semantic retrieval and clustering system for personal photos with sparse spontaneous speech tags annotated when the photos were taken.  ...  Only 10% of the photos need to be annotated by spontaneous speech of a few words regarding one or two semantic categories (e.g. what or where), while all photos can be effectively retrieved using highlevel  ...  In this paper we will exploit the use of visual words with spontaneous speech and investigate feasibility for search result clustering [5] .  ... 
doi:10.1145/1631127.1631134 fatcat:faza5sajiba4bkqcr5vm5a6avy

Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki, Shozo Makino
2009 EURASIP Journal on Audio, Speech, and Music Processing  
We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents.  ...  The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance."  ...  We implemented the speech recognition system with the unsupervised LM adaptation.  ... 
doi:10.1155/2009/140575 fatcat:2phjwq22bnelfbkojbsxzoa74i

Text Categorization with Latent Dirichlet Allocation

ZLACKÝ Daniel, STAŠ Ján, JUHÁR Jozef, CIŽMÁR Anton
2014 Journal of Electrical and Electronics Engineering  
This paper focuses on the text categorization of Slovak text corpora using latent Dirichlet allocation. Our goal is to build text subcorpora that contain similar text documents.  ...  We want to use these better organized text subcorpora to build more robust language models that can be used in the area of speech recognition systems.  ...  Clustering Text document clustering is an unsupervised learning algorithm, where documents are assigned into groups, called clusters.  ... 
doaj:19fdad600d194f1cbf1542e9c3563675 fatcat:4pigmhj6x5ch5p5d4hxm7w6lim

Unsupervised Signal Segmentation Based On Temporal Spectral Clustering

Régine André-Obrecht, Jose Arias Aguilar, Jérôme Farinas
2008 Zenodo  
Publication in the conference proceedings of EUSIPCO, Lausanne, Switzerland, 2008  ...  The corpus consists of spontaneous telephonic speech presented in sequences of around 45 seconds length sampled at 8 kHz. It is phonetically labeled by experts following the CSLU rules [12] .  ...  The algorithm is unsupervised, with only two parameters to define. In the case of speech processing, phonetic class labeling is also performed.  ... 
doi:10.5281/zenodo.40960 fatcat:it4x4ndixvewpp7w4dwn5kwfua

Unsupervised broadcast conversation speaker role labeling

Brian Hutchinson, Bin Zhang, Mari Ostendorf
2010 2010 IEEE International Conference on Acoustics, Speech and Signal Processing  
We present an approach to unsupervised speaker role labeling in talk show data that makes use of two complementary sets of features: structural features that encode the participation patterns of speakers  ...  Techniques for using multiple clusterings are explored, leading to more robust results.  ...  In English shows, for example, the spontaneous speech in the soundbites reduces the discriminative power of the conversational dimension of the lexical features.  ... 
doi:10.1109/icassp.2010.5494958 dblp:conf/icassp/HutchinsonZO10 fatcat:rrnynxufyjdsbih2poyhww6yi4

Language Model and Speaking Rate Adaptation for Spontaneous Presentation Speech Recognition

H. Nanjo, T. Kawahara
2004 IEEE Transactions on Speech and Audio Processing  
In spontaneous speech, SR is generally fast and may vary a lot. We also observe different error tendencies for portions of presentations where speech is fast or slow.  ...  The paper addresses adaptation methods to language model and speaking rate (SR) of individual speakers which are two major problems in automatic transcription of spontaneous presentation speech.  ...  Lee (Nara Institute of Science and Technology) for improving the LVCSR engine for spontaneous speech recognition.  ... 
doi:10.1109/tsa.2004.828641 fatcat:kmo3kv66fze6tfyr2iuxyi2274

Using hidden Markov models for topic segmentation of meeting transcripts

Melissa Sherman, Yang Liu
2008 2008 IEEE Spoken Language Technology Workshop  
To learn the model, we use unsupervised learning to cluster the text segments obtained from topic boundary information.  ...  We evaluate the effect of language model order, the number of hidden states, and the use of stop words.  ...  For unsupervised clustering of the text segments, we used CLUTO [10] to create the pre-defined number of clusters, based on the objective function to minimize the inter-cluster similarity and maximize  ... 
doi:10.1109/slt.2008.4777871 dblp:conf/slt/ShermanL08 fatcat:gn2glymqsncufkypjl4u5aulke

Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features

Jochen Weiner, Miguel Angrick, Srinivasan Umesh, Tanja Schultz
2018 Interspeech 2018  
We describe our unsupervised process chain consisting of voice activity detection and speaker diarization followed by extraction of features and detection of early signs of dementia.  ...  This paper presents recent progress toward our goal to enable area-wide pre-screening methods for the early detection of dementia based on automatically processing conversational speech of a representative  ...  speech and language use of people with dementia.  ... 
doi:10.21437/interspeech.2018-57 dblp:conf/interspeech/WeinerAUS18 fatcat:zctxilg2xjbs3o5pntslesf7em

Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

Carole Lailler, Grégor Dupuy, Mickael Rouvier, Sylvain Meignier
2013 Conference of the International Speech Communication Association  
Two approaches are considered in order to extract and to annotate the data: the first is semi-supervised and requires a human annotator to control the process, the second is totally unsupervised.  ...  The identification results have been analyzed in terms of speaker roles and fame, which is a subjective concept introduced to estimate the ease to model speakers.  ...  The corpus is balanced between prepared speech, with 7 broadcast news from French radio stations, and spontaneous speech, with 21 political discussions or street interviews.  ... 
dblp:conf/interspeech/Lailler13 fatcat:q3gxqbcjnfgxrchjrvjm2iaqie

A Rudimentary Lexicon and Semantics Help Bootstrap Phoneme Acquisition

Abdellah Fourtassi, Emmanuel Dupoux
2014 Proceedings of the Eighteenth Conference on Computational Natural Language Learning  
We start with corpora of spontaneous speech that have been encoded in a varying number of detailed context-dependent allophones.  ...  Infants spontaneously discover the relevant phonemes of their language without any direct supervision.  ...  Corpus We use two speech corpora: the Buckeye Speech corpus (Pitt et al., 2007) , which contains 40 hours of spontaneous conversations in American English, and the 40 hours core of the Corpus of Spontaneous  ... 
doi:10.3115/v1/w14-1620 dblp:conf/conll/FourtassiD14 fatcat:t7yp6vw5nbffzewr3m3befhxxa

On Time Document Retrieval using Speech Conversation and Diverse Keyword Clustering During Presentations

2020 International journal of recent technology and engineering  
In this paper we present the idea of extracting keywords from discussions, with the point of using these words to recuperate, for each small piece of conversation and generating reports to individuals.  ...  We initially propose a count to kill significant words from the yield of an ASR system which makes usage of topic showing strategies and of a sub particular prize limit which supports varying characteristics  ...  By clustering we would spontaneously expect at least one of the clusters corresponding or similar to the query term.  ... 
doi:10.35940/ijrte.c4544.099320 fatcat:u2ippfsmhzg7fcn4py7xaw3ysa

Rhetorical-State Hidden Markov Models for extractive speech summarization

Pascale Fung, Ricky Ho Yin Chan, Justin Jian Zhang
2008 Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing  
One of the most underutilized features in extractive summarization is rhetorical informationsemantically cohesive units that are hidden in spoken documents.  ...  Index Terms-spoken document summarization, hidden Markov models, speech features, rhetorical information 1-4244-1484-9/08/$25.00  ...  In extractive summarization for spontaneous speech, for a transcribed document D with a sequence of N recognized sentences Sj from the ASR output, D = {S 1 , S 2 , ……., S j , ……, S N }, j=1,2, ……., N,  ... 
doi:10.1109/icassp.2008.4518770 dblp:conf/icassp/FungCZ08 fatcat:fl7cvklbbzbwjfmarufjzd6g5q

Emotion Recognition from Speech: An Unsupervised Learning Approach

Stefano Rovetta, Zied Mnasri, Francesco Masulli, Alberto Cabri
2020 International Journal of Computational Intelligence Systems  
To avoid the cost of labeling, and at the same time to reduce the risk of overfitting due to lack of data, unsupervised learning seems a suitable alternative to recognize emotions from speech.  ...  This paper presents a novel approach for emotion recognition from speech signal, based on some variants of fuzzy clustering, such as probabilistic, possibilistic and graded-possibilistic fuzzy c-means.  ...  of unsupervised learning.  ... 
doi:10.2991/ijcis.d.201019.002 fatcat:fssoxkoiyjf7tk3kye5qydais4

Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-resource ASR

Bin Wu, Sakriani Sakti, Jinsong Zhang, Satoshi Nakamura
2022 IEEE/ACM Transactions on Audio Speech and Language Processing  
Such successful features as MFCC and PLP use filterbank techniques to model log-scaled speech perception but fail to model the adaptation of human speech perception by hearing experiences.  ...  This realization motivates us to propose to model such an unsupervised adaptation process, where adaptation denotes perception that is affected or changed by the history of experiences, with the Dirichlet  ...  The DPGMM model was found to suffer from the fragmental problem due to the noisy spontaneous Javanese speech.  ... 
doi:10.1109/taslp.2022.3150220 fatcat:svo6n3zua5gpraapwxxol5b3ji
« Previous Showing results 1 — 15 out of 1,578 results