Filters








7,854 Hits in 6.1 sec

Supervised and semi-supervised infant-directed speech classification for parent-infant interaction analysis

Ammar Mahdhaoui, Mohamed Chetouani
2011 Speech Communication  
To overcome these problems, we propose a new semisupervised approach based on the standard co-training algorithm exploiting labelled and unlabelled data.  ...  To overcome these problems, we propose a new semi-supervised approach based on the standard co-training algorithm exploiting labelled and unlabelled data.  ...  Only the n best classified utterances are added to the labelled set. The classifier is then retrained on the new set of labelled examples, and the process continues for several iterations.  ... 
doi:10.1016/j.specom.2011.05.005 fatcat:zj4xbhcunrd2fbgpjqcvi3ex5a

Active learning for automatic speech recognition

Dilek Hakkani-Tur, Giuseppe Riccardi, Allen Gorin
2002 IEEE International Conference on Acoustics Speech and Signal Processing  
In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR).  ...  function for a human to label.  ...  Acknowledgments We would like to thank Lidia Mangu for providing the sausage computation software, and Anne Kirkland, Murat Saraçlar, Gökhan Tür, and Roberto Gretter for their help with various software  ... 
doi:10.1109/icassp.2002.5745510 dblp:conf/icassp/Hakkani-TurRG02 fatcat:5osax7dzs5fxtkiqivbk55x2em

Active learning for automatic speech recognition

Hakkani-Tur, Riccardi, Gorin
2002 IIEEE International Conference on Acoustics Speech and Signal Processing  
In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR).  ...  function for a human to label.  ...  Acknowledgments We would like to thank Lidia Mangu for providing the sausage computation software, and Anne Kirkland, Murat Saraçlar, Gökhan Tür, and Roberto Gretter for their help with various software  ... 
doi:10.1109/icassp.2002.1004771 fatcat:sbta75o5qncdjlfoixq2aclrw4

Detection of Specific Language Impairment in Children Using Glottal Source Features

Mittapalle Kiran Reddy, Paavo Alku, Krothapalli Sreenivasa Rao
2020 IEEE Access  
In addition, Mel-frequency cepstral coefficient (MFCC) and openSMILE based acoustic features are also extracted from speech utterances.  ...  A leave-fourteen-speakers-out cross-validation strategy is used for evaluating the classifiers. The experiments are conducted using the SLI speech corpus launched by the LANNA research group.  ...  would like to thank Department of Signal Processing and Acoustics at Aalto University, Finland, and Sponsored Research and Industrial Consultancy at the Indian Institute of Technology Kharagpur, India, for  ... 
doi:10.1109/access.2020.2967224 fatcat:k47cczln3naz5mk2kinjv6ozqa

Model-based sequential organization in cochannel speech

Yang Shao, DeLiang Wang
2006 IEEE Transactions on Audio, Speech, and Language Processing  
We extract minimally corrupted segments, or usable speech, in cochannel speech using a robust multipitch tracking algorithm.  ...  To utilize speaker characteristics for sequential organization, we extend the traditional SID framework to cochannel speech and derive a joint objective for sequential grouping and SID, leading to a problem  ...  Wu for his assistance in using the multipitch tracking algorithm, J. Barker for a suggestion regarding exhaustive search complexity, and three anonymous referees for extensive and helpful comments.  ... 
doi:10.1109/tsa.2005.854106 fatcat:d22usgavvvc4rlnskoofpfhw2a

Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN

Zakir Ali, Arbab Waseem Abbas, T. M. Thasleema, Burhan Uddin, Tanzeela Raaz, Sahibzada Abdur Rehman Abid
2015 International Journal of Speech Technology  
Mel Frequency Cepstral Coefficients (MFCC) is used to extract speech features.  ...  A number of 50 individual Pashto native speakers (25 male and 25 female) of different ages, ranging from 18 to 60 years, were involved to utter from sefer (0) to naha (9) digits separately.  ...  First database was developed by collecting speech data from 50 native Pashto speakers then MFCC algorithm was used to extract features from speech data and as a result 10 text files, which contain features  ... 
doi:10.1007/s10772-014-9267-z fatcat:2ytmrxvpwna5xbkf7nwtvififi

Active learning: theory and applications to automatic speech recognition

G. Riccardi, D. Hakkani-Tur
2005 IEEE Transactions on Speech and Audio Processing  
function for a human to label.  ...  In this paper we describe how to estimate the confidence score for each utterance through an on-line algorithm using the lattice output of a speech recognizer.  ...  Cox for their continued support on this research topic. They would also like to thank G. Tur and M. Saraclar for their technical help and useful discussions.  ... 
doi:10.1109/tsa.2005.848882 fatcat:qa5d6rsrtjafvc6j4ub6uvje74

Phonetic Feature Discovery in Speech Using Snap-Drift Learning [chapter]

Sin Wee Lee, Dominic Palmer-Brown
2006 Lecture Notes in Computer Science  
This paper presents a new application of the snapdrift algorithm [1]: feature discovery and clustering of speech waveforms from nonstammering and stammering speakers.  ...  The speech waveforms are drawn from a phonetically annotated corpus, which facilitates phonetic interpretation of the classes of patterns discovered by the SDNN.  ...  takes to be certain of the speaker group for a particular speech utterance.  ... 
doi:10.1007/11840930_99 fatcat:ld2qxwzbcnfn3od3krqa5bylse

MuLER: Multiplet-Loss for Emotion Recognition

Anwer Slimi, Mounir Zrigui, Henri Nicolas
2022 Proceedings of the 2022 International Conference on Multimedia Retrieval  
The encoding will be done in a way that utterances with the same labels will have similar encodings.  ...  In our work, we propose a new loss function that aims to encode speeches instead of classifying them directly as the majority of the existing models do.  ...  To obtain a single result for the whole speech, the results were merged by majority voting. In the work of Aouani and Ayed [1] , a vector of 42 features was extracted from each signal.  ... 
doi:10.1145/3512527.3531406 fatcat:5rntsyduxba63jhq74wlg4tuea

Co-Adaptation of audio-visual speech and gesture classifiers

C. Mario Christoudias, Kate Saenko, Louis-Philippe Morency, Trevor Darrell
2006 Proceedings of the 8th international conference on Multimodal interfaces - ICMI '06  
We also propose a co-adaptation algorithm, which adapts existing audio-visual classifiers to a particular user or noise condition by leveraging the redundancy in the unlabeled data.  ...  Multimodal tasks are good candidates for multi-view learning, since each modality provides a potentially redundant view to the learning algorithm.  ...  Algorithm 1 Co-training Algorithm Given a small labeled set L, a large unlabeled set U , k views, and parameters N and T : Set t = 1 repeat for i = 1 to k do Train classifier fi on view i of L Use fi to  ... 
doi:10.1145/1180995.1181013 dblp:conf/icmi/ChristoudiasSMD06 fatcat:axyvbc3cwfgebcjaklaikyy5n4

Speaker Recognition Using Neural Tree Networks

Kevin R. Farrell, Richard J. Mammone
1993 Neural Information Processing Systems  
A new classifier is presented for text-independent speaker recognition. The new classifier is called the modified neural tree network (MNTN).  ...  The MNTN also uses leaf probability measures in addition to the class labels.  ...  The decision tree simulations utilized the IND package developed by W. Buntine of NASA.  ... 
dblp:conf/nips/FarrellM93 fatcat:vfcfnlfyrrhztawbv4phlf6aiy

Automatic Assessment of Student Translations for Foreign Language Tutoring

Chao Wang, Stephanie Seneff
2007 North American Chapter of the Association for Computational Linguistics  
This paper introduces the use of speech translation technology for a new type of voice-interactive Computer Aided Language Learning (CALL) application.  ...  Evaluation results are presented on the system's ability to match human judgment of the correctness of a student's translation, for a set of 1115 utterances collected from 9 learners of Mandarin Chinese  ...  Acknowledgements This research is supported in part by ITRI and the Cambridge MIT Initiative. The authors would like to acknowledge Yushi Xu for annotating the data.  ... 
dblp:conf/naacl/WangS07 fatcat:altx7lm4yfchzn7frlqun5mjwe

A new i-vector approach and its application to irrelevant variability normalization based acoustic model training

Yu Zhang, Zhi-Jie Yan, Qiang Huo
2011 2011 IEEE International Workshop on Machine Learning for Signal Processing  
New procedures for hyperparameter estimation and i-vector extraction are derived and presented.  ...  This paper presents a new approach to extracting a lowdimensional i-vector from a speech segment to represent acoustic information irrelevant to phonetic classification.  ...  New procedures for hyperparameter estimation and i-vector extraction are derived and presented.  ... 
doi:10.1109/mlsp.2011.6064637 dblp:conf/mlsp/0007YH11 fatcat:2d6ceoss6vfjbldy65xbdeud3e

Automatic Assessment of Speech Intelligibility for Individuals With Aphasia

Duc Le, Keli Licata, Carol Persad, Emily Mower Provost
2016 IEEE/ACM Transactions on Audio Speech and Language Processing  
We present our method for eliciting reliable ground-truth labels for speech intelligibility based on the perceptual judgment of nonexpert human evaluators.  ...  A prerequisite for producing meaningful feedback is speech intelligibility assessment.  ...  ACKNOWLEDGMENT We would like to thank Patrick Shin, Yoolim Jung, Kelly Karpus, Carly Swiftney, Benjamin Fine, Tasneem Tweel, Lucie Farrugia, Rebecca Rosen, Lily Chen, Meng Du, and the UMAP staff for their  ... 
doi:10.1109/taslp.2016.2598428 fatcat:mllmqirdyjbcxh2lnfhol73aia

A Bag-of-features Framework for Incremental Learning of Speech Invariants in Unsegmented Audio Streams

Olivier Mangin, Pierre-Yves Oudeyer, David Filliat
2010 International Conference on Epigenetic Robotics  
We introduce a computational framework that allows a machine to bootstrap flexible autonomous learning of speech recognition skills.  ...  We evaluate an implementation of this framework on a complex speech database.  ...  During a testing phase, we extract the bag of DAFs corresponding to the utterance. Then, for each tag we compute its score on the utterance, by summing the votes of each DAF.  ... 
dblp:conf/epirob/ManginOF10 fatcat:2yk7hznamrdy3frfu4l2mum3qu
« Previous Showing results 1 — 15 out of 7,854 results