5,194 Hits in 7.3 sec

Audio to score matching by combining phonetic and duration information [article]

Rong Gong, Jordi Pons, Xavier Serra
2017 arXiv   pre-print
We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.  ...  This leads us to propose a matching approach based on the use of phonetic and duration information.  ...  This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project  ... 
arXiv:1707.03547v1 fatcat:cnihiu72rbcm3g2sfou67u3r5u

Audio To Score Matching By Combining Phonetic And Duration Information

Rong Gong, Jordi Pons, Xavier Serra
2017 Zenodo  
This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project  ...  ACKNOWLEDGEMENTS We are grateful for the GPUs donated by NVidia.  ...  APPROACH The proposed approach aims to match the query audio to its score by using phonetic and duration information.  ... 
doi:10.5281/zenodo.1415765 fatcat:lwfqqagpsvfcbc3dzu2llcyp24

Query-by-example spoken term detection using phonetic posteriorgram templates

Timothy J. Hazen, Wade Shen, Christopher White
2009 2009 IEEE Workshop on Automatic Speech Recognition & Understanding  
This paper examines a query-by-example approach to spoken term detection in audio files.  ...  Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances.  ...  The authors would like to thank Fred Richardson for his help training and running the BUT phonetic recognizer.  ... 
doi:10.1109/asru.2009.5372889 dblp:conf/asru/HazenSW09 fatcat:wrj3fob2lzfpzh56brfaxmht7q

Matching phonetic information in lips and voice is robust in 4.5-month-old infants

Michelle L Patterson, Janet F Werker
1999 Infant Behavior and Development  
The present studies were conducted to replicate and extend past research by examining how robust the ability to match phonetic information in lips and voice is at 4.5-months of age.  ...  The finding that bimodal phonetic matching is replicated with full, naturalistic heads and with male stimuli supports the hypothesis that infants are able to link phonetic information presented in the  ...  structure do not lead to differences in the capacity to match phonetic information in the lips and voice.  ... 
doi:10.1016/s0163-6383(99)00003-x fatcat:iapizfscnbcupbiicxlcaka5za

Unsupervised query-by-example spoken term detection using segment-based Bag of Acoustic Words

Basil George, B. Yegnanarayana
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
and restore the time sequence information.  ...  Since this model ignores the sequence information in speech samples for efficient indexing of the database, a Dynamic Time Warping (DTW) based temporal matching technique is used to re-rank the results  ...  A new method of ranking audio documents which combines positional weights and similarity scores was also proposed.  ... 
doi:10.1109/icassp.2014.6854984 dblp:conf/icassp/GeorgeY14 fatcat:z5riwvpd7vbxhna6fnl3gglvei

Detecting topical events in digital video

Tanveer Syeda-Mahmood, S. Srinivasan
2000 Proceedings of the eighth ACM international conference on Multimedia - MULTIMEDIA '00  
Finally, we use a probabilistic model of event likelihood to combine the results of visual and audio event detection that exploits their time cooccurrence.  ...  It is also a very challenging problem requiring the detection and integration of evidence for an event available in multiple information modalities, such as audio, video and language.  ...  We merge the two ordered lists in time, and compute a combined score for each of the matches.  ... 
doi:10.1145/354384.354433 dblp:conf/mm/Syeda-MahmoodS00 fatcat:d4rhnxhmcrhtlozjgthtjo7ieq

Search the Audio, Browse the Video—A Generic Paradigm for Video Collections

Arnon Amir, Savitha Srinivasan, Alon Efrat
2003 EURASIP Journal on Advances in Signal Processing  
A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries.  ...  Keywords and phrases: automatic video indexing, video browsing, video and speech retrieval, phonetic speech retrieval.  ...  ACKNOWLEDGMENT We would like to thank Heather Poyhonen for her great help with the manuscript.  ... 
doi:10.1155/s111086570321012x fatcat:wpdlika5wjdu7dnk635s64gcwq

Query by example search with segmented dynamic time warping for non-exact spoken queries

Jorge Proenga, Arlindo Veiga, Fernando Perdigao
2015 2015 23rd European Signal Processing Conference (EUSIPCO)  
The built system is low-resource as it tries to solve the problem where the language of queries and searched audio is unspecified.  ...  This paper presents an approach to the Query-by-Example task of finding spoken queries on speech databases when the intended match may be non-exact or slightly complex.  ...  Therefore, the search is based on a phonetic-level match, and no word-level information is acquired.  ... 
doi:10.1109/eusipco.2015.7362666 dblp:conf/eusipco/ProencaVP15 fatcat:gdoadbjxm5cjjil64rid5glecy

Two-month-old infants match phonetic information in lips and voice

Michelle L. Patterson, Janet F. Werker
2003 Developmental Science  
Infants aged 4.5 months are able to match phonetic information in the face and voice (Kuhl & Meltzoff, 1982; Patterson & Werker, 1999) ; however, the ontogeny of this remarkable ability is not understood  ...  learning of phonetic information.  ...  Acknowledgements The authors would like to thank all of the parents who volunteered to participate in our study with their infants.  ... 
doi:10.1111/1467-7687.00271 fatcat:hvamfqme7zfzljh7rr7kdbtcpa

Discrimination tests of visually influenced syllables

Lawrence D. Rosenblum, Helena M. Saldaña
1992 Perception & Psychophysics  
Results show, however, that subjects are more likely to match the audio Ivai to the audiovisually consistent Ivai, suggesting differences in phonetic convincingness.  ...  Subjects were asked to match an audio syllable Ivai either to an audiovisually consistent syllable (audio Ivai-video Ifal) or an audiovisually discrepant syllable (audio /hal-video Ifa/).  ...  Preparation of Audiovisual Stimuli To find the most compelling visually influenced syllable, a number of different audiovisual syllable combinations were recorded and dubbed and then used in informal identification  ... 
doi:10.3758/bf03206706 pmid:1437479 fatcat:bkpla2wucvg6hbzkwx72p2hwwu

Employing Smart Logic to Spot Audio in Real Time on Deeply Embedded Systems [chapter]

Mario Malcangi
2011 IFIP Advances in Information and Communication Technology  
Researchers are very interested in using audio as interaction medium for text retrieval in spoken documents (conference speeches, broadcast news, etc.) to provide smart access to spoken audio and audio  ...  Not only does it represent semantic information; the same audio frame contains behavioral, environmental, psychological, and expressive information.  ...  Less challenging is the task of pattern matching applied to phonetic units. The main problem that arises is related to the very short duration of phonetic units compared to the duration of a word.  ... 
doi:10.1007/978-3-642-23957-1_14 fatcat:24k63ebs3zh7pk3qe2hse4edda

Far-Field Speaker Recognition

Qin Jin, Tanja Schultz, Alex Waibel
2007 IEEE Transactions on Audio, Speech, and Language Processing  
In addition, we performed multiple channel combination experiments to make use of information from multiple distant microphones.  ...  Overall, we achieved up to 87.1% relative improvements on our Distant Microphone database and found that the gains hold across different data conditions and microphone settings.  ...  Silence labels of duration greater than 0.5 seconds in the obtained phonetic sequences were wrapped together as an end of utterance to capture information about how a speaker interacts with others by for  ... 
doi:10.1109/tasl.2007.902876 fatcat:wo7hh5spunhajlkcloej6kwt6q

Automatic assessment of English learner pronunciation using discriminative classifiers

Mauro Nicolao, Amy V. Beeston, Thomas Hain
2015 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The crosscorrelation of the best system and average human annotator reference scores is 0.72, with miss and false alarm rate around 19%.  ...  DNNs trained on a large corpus of native and non-native learner speech are used to extract phoneme posterior probabilities.  ...  The F-score combines recall and precision rates [?] .  ... 
doi:10.1109/icassp.2015.7178993 dblp:conf/icassp/NicolaoBH15 fatcat:yqjzyjd3abd5tkqml2xllh5r7a

Vocabulary independent spoken term detection

Jonathan Mamou, Bhuvana Ramabhadran, Olivier Siohan
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
In addition to the output word transcript, advanced systems provide also phonetic transcripts, against which query terms can be matched phonetically.  ...  We present a vocabulary independent system that can handle arbitrary queries, exploiting the information provided by having both word transcripts and phonetic transcripts.  ...  ACKNOWLEDGEMENTS Jonathan Mamou is grateful to David Carmel and Ron Hoory for helpful and interesting discussions.  ... 
doi:10.1145/1277741.1277847 dblp:conf/sigir/MamouRS07 fatcat:xoc5pv2direjxflntjvfg7iv5y

Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input

Igor Malioutov, Alex Park, Regina Barzilay, James R. Glass
2007 Annual Meeting of the Association for Computational Linguistics  
The algorithm robustly handles noise inherent in acoustic matching by intelligently aggregating information about the similarity profile from multiple local comparisons.  ...  Our method predicts topic changes by analyzing the distribution of reoccurring acoustic patterns in the speech signal corresponding to a single speaker.  ...  We would like to thank T.J.  ... 
dblp:conf/acl/MalioutovPBG07 fatcat:jb3rzgbmmffe7exh5qvs72wl3e
« Previous Showing results 1 — 15 out of 5,194 results