A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
One TTS Alignment To Rule Them All
[article]
2021
arXiv
pre-print
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. ...
In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). ...
Similar to GlowTTS [6] and RAD-TTS [12] , we compute the soft alignment distribution based on the learned pairwise affinity between all text tokens and mel frames, which is normalized with softmax across ...
arXiv:2108.10447v1
fatcat:ua2hbehfareoxnfkfxbawo26ee
Tree-Based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair
2013
Learning and Nonlinear Models
However, this lack on explicit linguistic knowledge makes them unable to model some linguistic phenomena. ...
Current state-of-the-art approaches rely only on statistical methods that gather all necessary knowledge from parallel corpora. ...
This is due to the fact that extracts only the minimal rules from the alignment graphs and therefore only one derivation is considered for all alignment graphs. ...
doi:10.21528/lnlm-vol11-no1-art2
fatcat:3lrwss65bzhyfpjhhx5jqbva64
Transforming non textually aligned SPMD programs into textually aligned SPMD programs by using rewriting rules
2019
2019 International Conference on High Performance Computing & Simulation (HPCS)
We propose a set of transformation rules using rewriting techniques which allows to turn a non-textually aligned program to be textually aligned. ...
Then, the textual alignement of the synchronization barriers that is defined prevents deadlocks. However, the textual alignement property is not verified by all SPMD programs. ...
Of course, the transformation that we propose does not guarantee to turn all non-textually aligned SPMD programs to textually aligned ones. ...
doi:10.1109/hpcs48598.2019.9188223
dblp:conf/ieeehpcs/Bousdira19
fatcat:g6o77nn7dbf6bbdaub5yyerpqy
TTS-driven Embodied Conversation Avatar for UMB-SmartTV
2021
International journal of computers and communications
Features for selecting the shape and alignment of co-verbal movement are based on linguistic features (that can be extracted from arbitrary input text), and prosodic features (as predicted within several ...
processing steps in the TTS engine). ...
either based on semiotic or implicit rules. ...
doi:10.46300/91013.2021.15.1
fatcat:6wo2rrbp3ffqdib7gsfcdvjyri
Probing the phonetic and phonological knowledge of tones in Mandarin TTS models
[article]
2019
arXiv
pre-print
On the other hand, it is also suggested that linguistically informed stimuli should be included in the training and the evaluation of TTS models. ...
Results show that both baseline Tacotron 2 and Tacotron 2 with BERT embeddings capture the surface tonal coarticulation patterns well but fail to consistently apply the Tone-3 sandhi rule to novel sentences ...
Error analysis shows that the TTS models are not simply applying a fixed Tone-3 sandhi pattern to all trisyllabic words. ...
arXiv:1912.10915v1
fatcat:te5yvoyykjfzdccbf22n26ttku
A data-driven grapheme-to-phoneme conversion method using dynamic contextual converting rules for Korean TTS systems
2009
Computer Speech and Language
* similarity to overcome the rule application difficulties. ...
In this paper, we describe a method for automatically extracting grapheme-to-phoneme conversion rules directly from the transcription of speech synthesis database and introduce a weighted score and jamo ...
We first aligned them and extracted all the possible rules which satisfy the condition of the context length. ...
doi:10.1016/j.csl.2009.01.001
fatcat:nefpsmlzavctjamhns2yqftbiy
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
[article]
2021
arXiv
pre-print
The SH1 task consisted in building a Spanish text-to-speech system trained on (but not limited to) the corpus released by the Blizzard Challenge 2021 organization. ...
Only one system among other 11 participants achieved better naturalness than ours. Concretely, it achieved a naturalness MOS of 3.61 compared to 4.21 for real samples. ...
Acknowledgements The research leading to these results has received funding from the Government of Spain's research project Multisub (ref. RTI2018-094879-B-I00, MCIU/AEI/FEDER,EU).
References ...
arXiv:2110.15792v1
fatcat:slhzjdsfzvgbdcjgkymvfkhque
Testing the consistency assumption: Pronunciation variant forced alignment in read and spontaneous speech synthesis
2016
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We present evidence that in the alignment of both standard read prompts and spontaneous speech this phoneme sequence is often wrong, and that this is likely to have a negative impact on acoustic models ...
A perceptual evaluation of HMM-based voices showed that spontaneous models trained on this improved alignment also improved standard synthesis, despite breaking the consistency assumption. ...
Earlier work on spontaneous TTS admitted problems with speech alignment [11, Ch. 3] . ...
doi:10.1109/icassp.2016.7472660
dblp:conf/icassp/DallBRVHHYK16
fatcat:my6bkr3r6fdb3idder2vdw6l5a
A Human Quality Text to Speech System for Sinhala
2018
The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages
This paper proposes an approach on implementing a Text to Speech system for Sinhala language using MaryTTS framework. ...
In this project, a set of rules for mapping text to sound were identified and proceeded with Unit selection mechanism. ...
We also wish to acknowledge Miss Sumudu Nanayakkara for providing voice for the TTS and the students of the Faculty of Arts, University of Colombo who supported the evaluation. ...
doi:10.21437/sltu.2018-33
dblp:conf/sltu/NanayakkaraLVNP18
fatcat:6sbqkvyfzvdxvav4azm66dlnqa
Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis
2018
Interspeech 2018
The overall quality of TTS highly depends on the accuracy of phonemic transcriptions. ...
On a French TTS dataset, we show that we can detect up to 90.5% of errors of a state-of-the-art graphemeto-phoneme conversion system by annotating less than 15.8% of phonemes as erroneous. ...
conversion model and use them during phoneme recognition. ...
doi:10.21437/interspeech.2018-1306
dblp:conf/interspeech/VythelingumER18
fatcat:fkmbwxggcfbyjei2mubrx5vame
Automatic pronunciation prediction for text-to-speech synthesis of dialectal arabic in a speech-to-speech translation system
2012
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. ...
Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. ...
[7] , and combined them to generate a many-to-many (bidirectional) word alignment for phrase translation rule extraction. ...
doi:10.1109/icassp.2012.6289032
dblp:conf/icassp/AnanthakrishnanTPNV12
fatcat:n443k2xijzfxtezn6gkkox7hse
Information fusion approaches to the automatic pronunciation of print by analogy
2006
Information Fusion
Rather than committing to one specific heuristic scoring method, it may be preferable to use multiple strategies (i.e., soft experts) and then employ information fusion techniques to combine them to give ...
However, the process produces multiple candidate pronunciations and little or no theory exists to guide the choice among them. ...
its pronunciation (in phonemes) have been aligned in one-to-one fashion [4] . ...
doi:10.1016/j.inffus.2004.08.002
fatcat:rqlnwmprkzcsxjmsrzbhxsniry
POS-tagging a bilingual parallel corpus: methods and challenges
2017
Research in Corpus Linguistics
On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. ...
To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns. ...
The tagger POS taggers can be grouped into two main types: rule-based and stochastic. Rule-based taggers start by assigning all possible tags to words using a dictionary. ...
doi:10.32714/ricl.05.03
fatcat:o2au6abevnfardn7cg4wdoihea
Adaptive resource configuration for Cloud infrastructure management
2013
Future generations computer systems
As KM techniques, we investigate two methods, Case-Based Reasoning and a rule-based approach. We design and implement both of them and evaluate them with the help of a simulation engine. ...
We first hierarchically structure all possible adaptation actions into so-called escalation levels. ...
Emeakaroha (TU Vienna) for providing monitoring data on it. ...
doi:10.1016/j.future.2012.07.004
fatcat:mt2vtf45gfg7vovlw74knf7gym
Error Analysis of Automatic Speech Recognition Using Principal Direction Divisive Partitioning
[chapter]
2000
Lecture Notes in Computer Science
For each of six physicians, two hundred finished medical dictations aligned with their corresponding automatic speech recognition output were clustered and the results analyzed for linguistic regularities ...
This paper describes an experiment performed using the Principal Direction Divisive Partitioning algorithm (Boley, 1998) in order to extract linguistic word error regularities from several sets of medical ...
clustering them on word frequency. ...
doi:10.1007/3-540-45164-1_28
fatcat:dt2ka2cenng5jon4xgry4zmcgy
« Previous
Showing results 1 — 15 out of 23,691 results