Filters








23,691 Hits in 4.7 sec

One TTS Alignment To Rule Them All [article]

Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
2021 arXiv   pre-print
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line.  ...  In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS).  ...  Similar to GlowTTS [6] and RAD-TTS [12] , we compute the soft alignment distribution based on the learned pairwise affinity between all text tokens and mel frames, which is normalized with softmax across  ... 
arXiv:2108.10447v1 fatcat:ua2hbehfareoxnfkfxbawo26ee

Tree-Based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair

Daniel Beck, Helena Caseli
2013 Learning and Nonlinear Models  
However, this lack on explicit linguistic knowledge makes them unable to model some linguistic phenomena.  ...  Current state-of-the-art approaches rely only on statistical methods that gather all necessary knowledge from parallel corpora.  ...  This is due to the fact that extracts only the minimal rules from the alignment graphs and therefore only one derivation is considered for all alignment graphs.  ... 
doi:10.21528/lnlm-vol11-no1-art2 fatcat:3lrwss65bzhyfpjhhx5jqbva64

Transforming non textually aligned SPMD programs into textually aligned SPMD programs by using rewriting rules

Wadoud Bousdira
2019 2019 International Conference on High Performance Computing & Simulation (HPCS)  
We propose a set of transformation rules using rewriting techniques which allows to turn a non-textually aligned program to be textually aligned.  ...  Then, the textual alignement of the synchronization barriers that is defined prevents deadlocks. However, the textual alignement property is not verified by all SPMD programs.  ...  Of course, the transformation that we propose does not guarantee to turn all non-textually aligned SPMD programs to textually aligned ones.  ... 
doi:10.1109/hpcs48598.2019.9188223 dblp:conf/ieeehpcs/Bousdira19 fatcat:g6o77nn7dbf6bbdaub5yyerpqy

TTS-driven Embodied Conversation Avatar for UMB-SmartTV

Matej Rojc, Zdravko Kačič, Marko Presker, Izidor Mlakar
2021 International journal of computers and communications  
Features for selecting the shape and alignment of co-verbal movement are based on linguistic features (that can be extracted from arbitrary input text), and prosodic features (as predicted within several  ...  processing steps in the TTS engine).  ...  either based on semiotic or implicit rules.  ... 
doi:10.46300/91013.2021.15.1 fatcat:6wo2rrbp3ffqdib7gsfcdvjyri

Probing the phonetic and phonological knowledge of tones in Mandarin TTS models [article]

Jian Zhu
2019 arXiv   pre-print
On the other hand, it is also suggested that linguistically informed stimuli should be included in the training and the evaluation of TTS models.  ...  Results show that both baseline Tacotron 2 and Tacotron 2 with BERT embeddings capture the surface tonal coarticulation patterns well but fail to consistently apply the Tone-3 sandhi rule to novel sentences  ...  Error analysis shows that the TTS models are not simply applying a fixed Tone-3 sandhi pattern to all trisyllabic words.  ... 
arXiv:1912.10915v1 fatcat:te5yvoyykjfzdccbf22n26ttku

A data-driven grapheme-to-phoneme conversion method using dynamic contextual converting rules for Korean TTS systems

Jinsik Lee, Gary Geunbae Lee
2009 Computer Speech and Language  
* similarity to overcome the rule application difficulties.  ...  In this paper, we describe a method for automatically extracting grapheme-to-phoneme conversion rules directly from the transcription of speech synthesis database and introduce a weighted score and jamo  ...  We first aligned them and extracted all the possible rules which satisfy the condition of the context length.  ... 
doi:10.1016/j.csl.2009.01.001 fatcat:nefpsmlzavctjamhns2yqftbiy

VRAIN-UPV MLLP's system for the Blizzard Challenge 2021 [article]

Alejandro Pérez-González-de-Martos, Albert Sanchis, Alfons Juan
2021 arXiv   pre-print
The SH1 task consisted in building a Spanish text-to-speech system trained on (but not limited to) the corpus released by the Blizzard Challenge 2021 organization.  ...  Only one system among other 11 participants achieved better naturalness than ours. Concretely, it achieved a naturalness MOS of 3.61 compared to 4.21 for real samples.  ...  Acknowledgements The research leading to these results has received funding from the Government of Spain's research project Multisub (ref. RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU). References  ... 
arXiv:2110.15792v1 fatcat:slhzjdsfzvgbdcjgkymvfkhque

Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis

Rasmus Dall, Sandrine Brognaux, Korin Richmond, Cassia Valentini-Botinhao, Gustav Eje Henter, Julia Hirschberg, Junichi Yamagishi, Simon King
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
We present evidence that in the alignment of both standard read prompts and spontaneous speech this phoneme sequence is often wrong, and that this is likely to have a negative impact on acoustic models  ...  A perceptual evaluation of HMM-based voices showed that spontaneous models trained on this improved alignment also improved standard synthesis, despite breaking the consistency assumption.  ...  Earlier work on spontaneous TTS admitted problems with speech alignment [11, Ch. 3] .  ... 
doi:10.1109/icassp.2016.7472660 dblp:conf/icassp/DallBRVHHYK16 fatcat:my6bkr3r6fdb3idder2vdw6l5a

A Human Quality Text to Speech System for Sinhala

Lakshika Nanayakkara, Chamila Liyanage, Pubudu Tharaka Viswakula, Thilini Nagungodage, Randil Pushpananda, Ruvan Weerasinghe
2018 The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages  
This paper proposes an approach on implementing a Text to Speech system for Sinhala language using MaryTTS framework.  ...  In this project, a set of rules for mapping text to sound were identified and proceeded with Unit selection mechanism.  ...  We also wish to acknowledge Miss Sumudu Nanayakkara for providing voice for the TTS and the students of the Faculty of Arts, University of Colombo who supported the evaluation.  ... 
doi:10.21437/sltu.2018-33 dblp:conf/sltu/NanayakkaraLVNP18 fatcat:6sbqkvyfzvdxvav4azm66dlnqa

Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis

Kévin Vythelingum, Yannick Estève, Olivier Rosec
2018 Interspeech 2018  
The overall quality of TTS highly depends on the accuracy of phonemic transcriptions.  ...  On a French TTS dataset, we show that we can detect up to 90.5% of errors of a state-of-the-art graphemeto-phoneme conversion system by annotating less than 15.8% of phonemes as erroneous.  ...  conversion model and use them during phoneme recognition.  ... 
doi:10.21437/interspeech.2018-1306 dblp:conf/interspeech/VythelingumER18 fatcat:fkmbwxggcfbyjei2mubrx5vame

Automatic pronunciation prediction for text-to-speech synthesis of dialectal arabic in a speech-to-speech translation system

Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Prem Natarajan, Aravind Namandi Vembu
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text.  ...  Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language.  ...  [7] , and combined them to generate a many-to-many (bidirectional) word alignment for phrase translation rule extraction.  ... 
doi:10.1109/icassp.2012.6289032 dblp:conf/icassp/AnanthakrishnanTPNV12 fatcat:n443k2xijzfxtezn6gkkox7hse

Information fusion approaches to the automatic pronunciation of print by analogy

R.I. Damper, Y. Marchand
2006 Information Fusion  
Rather than committing to one specific heuristic scoring method, it may be preferable to use multiple strategies (i.e., soft experts) and then employ information fusion techniques to combine them to give  ...  However, the process produces multiple candidate pronunciations and little or no theory exists to guide the choice among them.  ...  its pronunciation (in phonemes) have been aligned in one-to-one fashion [4] .  ... 
doi:10.1016/j.inffus.2004.08.002 fatcat:rqlnwmprkzcsxjmsrzbhxsniry

POS-tagging a bilingual parallel corpus: methods and challenges

Irene Doval
2017 Research in Corpus Linguistics  
On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific.  ...  To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.  ...  The tagger POS taggers can be grouped into two main types: rule-based and stochastic. Rule-based taggers start by assigning all possible tags to words using a dictionary.  ... 
doi:10.32714/ricl.05.03 fatcat:o2au6abevnfardn7cg4wdoihea

Adaptive resource configuration for Cloud infrastructure management

Michael Maurer, Ivona Brandic, Rizos Sakellariou
2013 Future generations computer systems  
As KM techniques, we investigate two methods, Case-Based Reasoning and a rule-based approach. We design and implement both of them and evaluate them with the help of a simulation engine.  ...  We first hierarchically structure all possible adaptation actions into so-called escalation levels.  ...  Emeakaroha (TU Vienna) for providing monitoring data on it.  ... 
doi:10.1016/j.future.2012.07.004 fatcat:mt2vtf45gfg7vovlw74knf7gym

Error Analysis of Automatic Speech Recognition Using Principal Direction Divisive Partitioning [chapter]

David McKoskey, Daniel Boley
2000 Lecture Notes in Computer Science  
For each of six physicians, two hundred finished medical dictations aligned with their corresponding automatic speech recognition output were clustered and the results analyzed for linguistic regularities  ...  This paper describes an experiment performed using the Principal Direction Divisive Partitioning algorithm (Boley, 1998) in order to extract linguistic word error regularities from several sets of medical  ...  clustering them on word frequency.  ... 
doi:10.1007/3-540-45164-1_28 fatcat:dt2ka2cenng5jon4xgry4zmcgy
« Previous Showing results 1 — 15 out of 23,691 results