Filters








119 Hits in 4.2 sec

TTS Skins: Speaker Conversion via ASR [article]

Adam Polyak, Lior Wolf, Yaniv Taigman
2020 arXiv   pre-print
We train the network on narrated audiobooks, and demonstrate multi-voice TTS in those voices, by converting the voice of a TTS robot.  ...  We present a fully convolutional wav-to-wav network for converting between speakers' voices, without relying on text.  ...  Similarly, voice conversion for speaker j is obtained via D(E(s), F 0(s), vj). Fitting New Speakers.  ... 
arXiv:1904.08983v2 fatcat:3ekhspgtfbbuxkzksu5xsorop4

TTS Skins: Speaker Conversion via ASR

Adam Polyak, Lior Wolf, Yaniv Taigman
2020 Interspeech 2020  
We train the network on narrated audiobooks, and demonstrate multi-voice TTS in those voices, by converting the voice of a TTS robot.  ...  We present a fully convolutional wav-to-wav network for converting between speakers' voices, without relying on text.  ...  Similarly, voice conversion for speaker j is obtained via D(E(s), F 0(s), vj). Fitting New Speakers.  ... 
doi:10.21437/interspeech.2020-1416 dblp:conf/interspeech/PolyakWT20 fatcat:yj6zqbykpbcepdtmxswyw7ewly

HiFi-VC: High Quality ASR-Based Voice Conversion [article]

A. Kashkin, I. Karpukhin, S. Shishkin
2022 arXiv   pre-print
Our approach uses automated speech recognition (ASR) features, pitch tracking, and a state-of-the-art waveform prediction model.  ...  Despite recent progress, any-to-any conversion quality is still inferior to natural speech. In this work, we propose a new any-to-any voice conversion pipeline.  ...  Speaker information is usually eliminated from F0 by normalization [23, 8] . Our method uses ASR and F0 encoder similar to TTS Skins [13] .  ... 
arXiv:2203.16937v1 fatcat:oapl7fp4abdohntywkpudnicmm

A study of prosodic alignment in interlingual map-task dialogues

Hayakawa Akira, Loredana Cerrato, Nick Campbell, Saturnino Luz
2015 International Congress of Phonetic Sciences  
), Machine Translation (MT) and Text To Speech (TTS).  ...  This paper reports results from a study of how speakers adjust their speaking style in relation to errors from Automatic Speech Recognition (ASR), while performing an Interlingual map task.  ...  Using a prototype system able to record synchronised interaction data streams, such as high quality video and audio, time-stamped ASR, MT and TTS events as well as biosignals (heart rate, skin conductance  ... 
dblp:conf/icphs/HayakawaCCL15 fatcat:lrsadahz3fgrdgtsxvu63rez3q

An Event-Based Conversational System for the Nao Robot [chapter]

Ivana Kruijff-Korbayová, Georgios Athanasopoulos, Aryel Beck, Piero Cosi, Heriberto Cuayáhuitl, Tomas Dekens, Valentin Enescu, Antoine Hiolle, Bernd Kiefer, Hichem Sahli, Marc Schröder, Giacomo Sommavilla (+2 others)
2011 Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop  
We applied speaker adaptation techniques, using a small amount of data from individual speakers to improve the speaker-independent model.  ...  These audio segments are made available for further processing via Urbi event emission.  ... 
doi:10.1007/978-1-4614-1335-6_14 fatcat:s6lwip72pve45lkfcqxe7w7b6m

Passing an Enhanced Turing Test – Interacting with Lifelike Computer Representations of Specific Individuals

Avelino J. Gonzalez, Jason Leigh, Ronald F. DeMara, Andrew Johnson, Steven Jones, Sangyoon Lee, Victor Hung, Luc Renambot, Carlos Leon-Barth, Maxine Brown, Miguel Elvir, James Hollister (+1 others)
2013 Journal of Intelligent Systems  
AbstractThis article describes research to build an embodied conversational agent (ECA) as an interface to a question-and-answer (Q/A) system about a National Science Foundation (NSF) program.  ...  In an idealized case, the LifeLike Avatar could conceivably provide a user with a level of interaction such that he or she would not be certain as to whether he or she is talking to the actual person via  ...  Once a context is recognized, the descriptions associated with that context contain the information requested by the human user and the avatar enunciates it via a TTS system.  ... 
doi:10.1515/jisys-2013-0016 fatcat:b27p7byiezd67cola3sw3kw5tu

Passing an Enhanced Turing Test – Interacting with Lifelike Computer Representations of Specific Individuals

Avelino J. Gonzalez, Jason Leigh, Ronald F. DeMara, Andrew Johnson, Steven Jones, Sangyoon Lee, Victor Hung, Gordon S. Carlson, Luc Renambot, Carlos Leon-Barth, Maxine Brown, Miguel Elvir (+2 others)
2014 Journal of Intelligent Systems  
This article describes research to build an embodied conversational agent (ECA) as an interface to a question-and-answer (Q/A) system about a National Science Foundation (NSF) program.  ...  In an idealized case, the LifeLike Avatar could conceivably provide a user with a level of interaction such that he or she would not be certain as to whether he or she is talking to the actual person via  ...  Once a context is recognized, the descriptions associated with that context contain the information requested by the human user and the avatar enunciates it via a TTS system.  ... 
doi:10.1515/jisys-2014-0085 fatcat:3sxw6y4ihndbzdbkq2wanycpli

Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review

Wookey Lee, Jessica Jiwon Seong, Busra Ozlu, Bong Sup Shim, Azizbek Marakhimov, Suan Lee
2021 Sensors  
The output calculated is converted to Text-to-Speech (TTS) and then sent to the user via bone conduction headphones.  ...  The output calculated is converted to Text-to-Speech (TTS) and then sent to the user via bone conduction headphones.  ... 
doi:10.3390/s21041399 pmid:33671282 fatcat:je4cmqkulnbmpbji3owxgr7f24

Silent Speech Interfaces for Speech Restoration: A Review [article]

Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M. Martín-Doñas, José L. Pérez-Córdoba, Angel M. Gomez
2020 arXiv   pre-print
and TTS systems.  ...  In [71] , automatic speech recognition (ASR) from electromyography (EMG) signals was investigated for dysarthric speakers.  ... 
arXiv:2009.02110v2 fatcat:i2o4zxqko5anhn2eqivtnsd2di

Silent Speech Interfaces for Speech Restoration: A Review

Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M. Martin-Donas, Jose L. Perez-Cordoba, Angel M. Gomez
2020 IEEE Access  
and TTS systems.  ...  In [71] , automatic speech recognition (ASR) from electromyography (EMG) signals was investigated for dysarthric speakers.  ... 
doi:10.1109/access.2020.3026579 fatcat:yvvaebeavfdfrav73sfs62a5dm

Measuring a decade of progress in Text-to-Speech

Simon King
2014 Loquens  
This is most commonly achieved using adaptation techniques borrowed from ASR, then subsequently extended for TTS.  ...  Voice conversion Blizzard requires that the entered voice sounds close to the provided speaker, which usually means building a voice on that data.  ... 
doi:10.3989/loquens.2014.006 fatcat:reyfthn4n5a2ne43a4ftir4x5m

The TORGO database of acoustic and articulatory speech from speakers with dysarthria

Frank Rudzicz, Aravind Kumar Namasivayam, Talya Wolff
2011 Language Resources and Evaluation  
This database only includes 10 stimuli per speaker, however, which is not enough to train ASR systems. This database may be freely procured from the authors for academic use.  ...  Public databases of endogenous articulation exist, but only for non-dysarthric speakers.  ...  The TORGO database is primarily a resource for developing ASR models more suited to the needs of people with atypical speech production, although it is equally useful to the more general ASR community.  ... 
doi:10.1007/s10579-011-9145-0 fatcat:37tjsx7xxvgw7e5hd772ri7snu

Automatic fingersign-to-speech translation system

Marek Hrúz, Pavel Campr, Erinç Dikici, Ahmet Alp Kındıroğlu, Zdeněk Krňoul, Alexander Ronzhin, Haşim Sak, Daniel Schorno, Hülya Yalçın, Lale Akarun, Oya Aran, Alexey Karpov (+2 others)
2011 Journal on Multimodal User Interfaces  
The most popular ASR models apply speaker independent speech recognition, though in some cases (for instance, personalized systems that have to recognize owner only) speaker dependent systems are more  ...  Speech synthesis Two TTS systems are applied in our global system: Open MARY TTS [50] for the English and Turkish languages developed by DFKI (Germany), and the Russian TTS engine developed by UIIP (  ... 
doi:10.1007/s12193-011-0059-3 fatcat:dexq3o2auzeupivz76sn6yn4bi

$NLP: How to spend a billion dollars

Robert Dale
2022 Natural Language Engineering  
user via natural language.  ...  LOVO (seed; US$4.5m) offers 180+ voice skins in 33 languages for synthesising speech in a number of genres, such as audiobooks, games and documentaries; it also lets you build a customised voice skin using  ... 
doi:10.1017/s1351324921000450 fatcat:bq526n46fncahpf6qfkjtv4jvq

A silent speech system based on permanent magnet articulography and direct synthesis

Jose A. Gonzalez, Lam A. Cheah, James M. Gilbert, Jie Bai, Stephen R. Ell, Phil D. Green, Roger K. Moore
2016 Computer Speech and Language  
The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers.  ...  In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned  ...  Second, speech articulation and its associated auditory feedback are disconnected due to the variable delay introduced by the ASR and TTS steps.  ... 
doi:10.1016/j.csl.2016.02.002 fatcat:dplgy32pnjbojjtisw4zupmga4
« Previous Showing results 1 — 15 out of 119 results