Filters








37 Hits in 5.7 sec

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control [article]

Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, Georgia Maniati, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris
2021 arXiv   pre-print
It utilizes a Tacotron-based multispeaker acoustic model trained on read-only speech data and which provides prosody control at the phoneme level.  ...  Dataset augmentation and additional prosody manipulation based on traditional DSP algorithms are also investigated.  ...  Conclusions In this paper, we presented an approach for producing highquality singing and rapping synthesis from a Tacotron-based fine-grained prosody-control voice model trained solely on read data.  ... 
arXiv:2111.09146v1 fatcat:fonznraxrvcu7kvaxubkmpe35m

Cross-Lingual Voice Conversion with Non-Parallel Data

Pablo Alonso-Jiménez
2017 Zenodo  
In this project a Phonetic Posteriorgram (PPG) based Voice Conversion system is implemented. The main goal is to perform and evaluate conversions of singing voice.  ...  Additionally, the use of spectral envelope based MFCC and pseudo-singing dataset for ASR training are proposed in order to improve the performance of the system in the singing context.  ...  A Neural Parametric Singing Synthesizer NPSS consist in an neural network based system that generates probability distributions of the target features based on the past values and a phonetic control.  ... 
doi:10.5281/zenodo.1117153 fatcat:prwivervc5dijhlzyhowmyy22e

Techniques and Challenges in Speech Synthesis [article]

David Ferris
2017 arXiv   pre-print
Methods for further improving sentence level speech naturalness were discussed. Finally, the system was tested with listeners for its intelligibility and naturalness.  ...  The aim of this project was to develop and implement an English language Text-to-Speech synthesis system.  ...  Prosody Based on Lexical Class We also want a user to be able to define additional prosodic characteristics on the word level based on the lexical class of the word being considered.  ... 
arXiv:1709.07552v1 fatcat:o75yc226ubanppunnqy37jhdua

Jukebox: A Generative Model for Music [article]

Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever
2020 arXiv   pre-print
We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable.  ...  We introduce Jukebox, a model that generates music with singing in the raw audio domain.  ...  We thus have to model the temporal alignment of lyrics and singing, the artists voice and also the diversity of ways one can sing a phrase depending on the pitch, melody, rhythm and even genre of the song  ... 
arXiv:2005.00341v1 fatcat:drwspmscbjfknhqdlunbp6spkm

Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR) (Dagstuhl Seminar 16442)

Roger K. Moore, Serge Thill, Ricard Marxer, Marc Herbstritt
2017 Dagstuhl Reports  
Siri -Apple's voice-enabled personal assistant) are understandably based on rather naïve models of message passing and strict turn-taking.  ...  I work in the area of system architectures for very low-latency reactions and controllable reflexive behaviours, based on incremental processing [1] which allows the concurrent and modular processing  ...  other linguistic and gestural levels and on back-channeling behavior in a robot?).  ... 
doi:10.4230/dagrep.6.10.154 dblp:journals/dagstuhl-reports/MooreTM16 fatcat:tqza2kcponamjkquzibdozgryy

Papers from the Seventh International Conference on Austroasiatic Linguistics

Hiram Ring, Felix Rau
2018 Journal of the Southeast Asian Linguistics Society  
This is a collection of 9 articles from the Seventh International Conference on Austroasiatic Linguistics held in 2017 in Kiel, Germany.  ...  1500092 (for work on Gutob) and 0853877 (for work on Remo), National Endowment for the Humanities Award PD-50025-13 (for work on Gta'), a Genographic Legacy Fund award (for work on Ho), several National  ...  Geographic Society awards (for work on Sora, Santali, Mundari, Kera', Gorum, Juray, Korku, Nihali, Juang and Kharia) and an award from the Zegar Family Foundation (for work on Birhor).  ... 
doaj:cce2b08cc91d425d92fcf7b6fe2f0cde fatcat:fbv4mjbvjrhyxddi6uqt5542bm

Neural network modeling of a dolphin's sonar discrimination capabilities

Whitlow W. L. Au, Lars N. Andersen, A. René Rasmussen, Herbert L. Roitblat, Paul E. Nachtigall
1995 Journal of the Acoustical Society of America  
The sound field scattered by a smooth thin elastic shell immersed in fluid arises largely from specular reflection and acoustic-membrane coupling, unless both source and observer are located near or on  ...  This project uses the trained singing voice as a source of controlled variation in parameters important to the timbre of both speech and voice.  ...  A cost function based on the volume velocity in active control of structural radiation has the advantage of keeping the control simple (one error sensor).  ... 
doi:10.1121/1.413700 pmid:7608403 fatcat:m3nunzs4wfflhfl27n4shdn2n4

Neural network modeling of a dolphin's sonar discrimination capabilities

Lars N. Andersen, A. René Rasmussen, Whitlow W. L. Au, Paul E. Nachtigall, Herbert Roitblat
1994 Journal of the Acoustical Society of America  
The sound field scattered by a smooth thin elastic shell immersed in fluid arises largely from specular reflection and acoustic-membrane coupling, unless both source and observer are located near or on  ...  This project uses the trained singing voice as a source of controlled variation in parameters important to the timbre of both speech and voice.  ...  A cost function based on the volume velocity in active control of structural radiation has the advantage of keeping the control simple (one error sensor).  ... 
doi:10.1121/1.410770 fatcat:ioiiov6bmjdi7kiflait5dhdfe

Modeling and experiments with low‐frequency pressure wave propagation in liquid‐filled, flexible tubes

Cato Bjelland, Leif Bjo/rno/
1992 Journal of the Acoustical Society of America  
All listeners except one showed a level disadvantage: The localization error rate increased with increasing level, on the average by a factor of 10 over the range of levels.  ...  The motivation is to improve the amplitude and to control the frequency content of the ultrasonic signal by taking advantage of the tremendous flexibility that one has in controlling the size and the shape  ...  The method offers one way of bringing together the specialist techniques of synthesis and recognition and in particular the use of prosody in each. Recent results will be given in the paper.  ... 
doi:10.1121/1.404777 fatcat:xhmwz65h5bbqxbt52khae2rq7q

Predicting room acoustical behavior with the ODEON computer model

Graham Naylor, Jens Holger Rindel
1992 Journal of the Acoustical Society of America  
All listeners except one showed a level disadvantage: The localization error rate increased with increasing level, on the average by a factor of 10 over the range of levels.  ...  The motivation is to improve the amplitude and to control the frequency content of the ultrasonic signal by taking advantage of the tremendous flexibility that one has in controlling the size and the shape  ...  The method offers one way of bringing together the specialist techniques of synthesis and recognition and in particular the use of prosody in each. Recent results will be given in the paper.  ... 
doi:10.1121/1.404931 fatcat:z4hbezklfzgu7hyy7qvos3wyo4

Treatment of early and late reflections in a hybrid computer model for room acoustics

Graham Naylor
1992 Journal of the Acoustical Society of America  
All listeners except one showed a level disadvantage: The localization error rate increased with increasing level, on the average by a factor of 10 over the range of levels.  ...  The motivation is to improve the amplitude and to control the frequency content of the ultrasonic signal by taking advantage of the tremendous flexibility that one has in controlling the size and the shape  ...  The method offers one way of bringing together the specialist techniques of synthesis and recognition and in particular the use of prosody in each. Recent results will be given in the paper.  ... 
doi:10.1121/1.404930 fatcat:xeehcxepjvhuzmnaeudpqxtnvu

Sociophonetics of popular music: insights from corpus analysis and speech perception experiments [article]

Andy M. Gibson, University Of Canterbury
2020
This will be discussed in Chapter 1, where I also introduce the term Standard Popular Music Singing Style (SPMSS) to refer to the US English-derived phonetic style dominant in popular song.  ...  This reflects the value placed on authenticity in hip hop, and also interacts with ethnicity, showing the use of different authentication practices by P¯akeh¯a (NZ European) and M¯aori/Pasifika artists  ...  force aligned at the phoneme level.  ... 
doi:10.26021/4007 fatcat:oiab4zjzmvgy3ppn32tsj5f5yi

Language and life history: A new perspective on the development and evolution of human language

John L. Locke, Barry Bogin
2006 Behavioral and Brain Sciences  
First, there is, as we have pointed out, a logical connection between the use and control of the voice in singing and speech -an excellent reason for collaboration between speech researchers such as Oller  ...  Singing is not.  ... 
doi:10.1017/s0140525x0600906x fatcat:3dg3oe5u3vhe7pxqi2ydpzmaeu

Influence of statistical surface models on dynamic scattering of high‐frequency signals from the ocean surface

Christian Bjerrum‐Niese, Leif Bjo/rno/
1994 Journal of the Acoustical Society of America  
The sound field scattered by a smooth thin elastic shell immersed in fluid arises largely from specular reflection and acoustic-membrane coupling, unless both source and observer are located near or on  ...  This project uses the trained singing voice as a source of controlled variation in parameters important to the timbre of both speech and voice.  ...  A cost function based on the volume velocity in active control of structural radiation has the advantage of keeping the control simple (one error sensor).  ... 
doi:10.1121/1.411137 fatcat:ajwj7bozxzg6fd27v4xe57drgm

Rhythmic perception and entrainment in 5-year-old children

John Parker Verney, Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository
2013
The data are interpreted with respect to a theoretical framework linking music and language based on temporal sampling.  ...  To this end rhythmic entrainment tasks were presented in a range of musical activities including drumming along to music and singing nursery songs and rhymes.  ...  This was one of the first studies to empirically demonstrate syllable-level segmentation was easier than phoneme-level segmentation.  ... 
doi:10.17863/cam.16494 fatcat:wnhx3lpuwzarrcjellu255cize
« Previous Showing results 1 — 15 out of 37 results