Filters








125 Hits in 6.1 sec

Karaoker: Alignment-free singing voice synthesis with speech training data [article]

Panos Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, June Sig Sung, Gunu Jho, Pirros Tsiakoulis, Aimilios Chalamandaris
2022 arXiv   pre-print
Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information.  ...  In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments.  ...  The samples are synthesized by a vocoder trained on speech data, maintaining the singing-data-free constraint end-to-end.  ... 
arXiv:2204.04127v1 fatcat:p32qhfq4m5enrhhri2sea4mgqm

Vocal fold vibratory and acoustic features in fatigued Karaoke singers

Gaowu Wang, Andy Lo, Karen Chan, Jiangping Kong, Edwin Yiu
2012 Journal of the Acoustical Society of America  
Analysis of Chinese singing voices and its application to singing voice synthesis.  ...  However, there are few researches on singing voice synthesis in Chinese.  ...  This study examined whether 1-hour perceptual training could elicit feature-specific improvement of performance and corresponding cortical plasticity in humans during speech segregation by using magnetoencephalography  ... 
doi:10.1121/1.4708731 fatcat:cu5chvstgjbzhlhdzd63flz7nu

KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms [article]

Chien-Feng Liao, Jen-Yu Liu, Yi-Hsuan Yang
2021 arXiv   pre-print
In this paper, we propose a novel neural network model called KaraSinger for a less-studied singing voice synthesis (SVS) task named score-free SVS, in which the prosody and melody are spontaneously decided  ...  We keep the architecture of both the VQ-VAE and LM light-weight for fast training and inference speed.  ...  INTRODUCTION Singing voice synthesis (SVS) is the task of computationally generating singing voices from music scores and lyrics [1, 2] .  ... 
arXiv:2110.04005v1 fatcat:adhojwqxvvdujb4qlneuuoovpq

Automatic Recognition of Lyrics in Singing

Annamaria Mesaros, Tuomas Virtanen
2010 EURASIP Journal on Audio, Speech, and Music Processing  
Due to the lack of annotated singing databases, the recognizer is trained using speech and linearly adapted to singing.  ...  The phoneme language models are trained on the speech database text. The large-vocabulary word-level language model is trained on a database of textual lyrics. Two applications are presented.  ...  Acoustic Data. The acoustic models of the recognizer were trained using the CMU Arctic speech database (CMU ARCTIC databases for speech synthesis: http://festvox.org/cmuarctic/).  ... 
doi:10.1186/1687-4722-2010-546047 fatcat:lksnvnccivd7ziqyoe3ld2u5lm

Automatic Recognition of Lyrics in Singing

Annamaria Mesaros, Tuomas Virtanen
2010 EURASIP Journal on Audio, Speech, and Music Processing  
Due to the lack of annotated singing databases, the recognizer is trained using speech and linearly adapted to singing.  ...  The phoneme language models are trained on the speech database text. The large-vocabulary word-level language model is trained on a database of textual lyrics. Two applications are presented.  ...  Acoustic Data. The acoustic models of the recognizer were trained using the CMU Arctic speech database (CMU ARCTIC databases for speech synthesis: http://festvox.org/cmuarctic/).  ... 
doi:10.1155/2010/546047 fatcat:64uir6egxzgkjhdnc3gwn4qnbe

NHSS: A Speech and Singing Parallel Database [article]

Bidisha Sharma, Xiaoxue Gao, Karthika Vijayan, Xiaohai Tian, Haizhou Li
2021 arXiv   pre-print
speech and singing voices, and speech-to-singing conversion.  ...  We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral mapping, and conversion using the NHSS database.  ...  With aligned speech MCCs, and the F0 and AP of template singing, we generate the converted singing voice through the synthesis module.  ... 
arXiv:2012.00337v2 fatcat:rx3eeirs6bhe5aoclcldo23ovq

Issues on Modeling the Singing Voice

Alex Loscos, Xavier Serra
2003 Zenodo  
This set of gathered publications are mainly focused on the field of singing voice processing; more precisely, on spectral processing techniques and voice modeling for singing voice analysis, transformation  ...  and synthesis.  ...  I have worked with. Acknowledgements We would like to acknowledge the contribution to this research of the other members of the Music Technology Group of the Audiovisual Institute.  ... 
doi:10.5281/zenodo.3739254 fatcat:jczy57vmfbbbbg2rrnnswiykvu

Score and Lyrics-Free Singing Voice Generation [article]

Jen-Yu Liu and Yu-Hua Chen and Yin-Cheng Yeh and Yi-Hsuan Yang
2020 arXiv   pre-print
Generative models for singing voice have been mostly concerned with the task of "singing voice synthesis," i.e., to produce singing voice waveforms given musical scores and text lyrics.  ...  In this work, we explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time.  ...  We aim to explore such a new task in this paper: teaching a machine to sing with a training collection of singing voices, but without the corresponding musical scores and lyrics of the training data.  ... 
arXiv:1912.11747v2 fatcat:xoheezk7cjf4jenabtuqkl5xla

Singing-driven interfaces for sound synthesizers

Jordi Janer, Xavier Serra
2008 Zenodo  
Under the title of singing-driven interfaces, we aim to design systems that allow controlling the synthesis of musical instruments sounds with the singing voice.  ...  We propose two different approaches, one for controlling a singing voice synthesizer, and another for controlling the synthesis of instrumental sounds.  ...  In an off-line manner, Meron (1999) uses singing performances as input for an automatically trained system that produces high-quality singing voice synthesis.  ... 
doi:10.5281/zenodo.3685558 fatcat:lnxevw4cmzht5m6rz2tgtk6a4y

Computational Methods for Melody and Voice Processing in Music Recordings (Dagstuhl Seminar 19052)

Meinard Müller, Emilia Gómez, Yi-Hsun Yang, Michael Wagner
2019 Dagstuhl Reports  
The Dagstuhl Seminar 19052 was devoted to a branch of MIR that is of particular importance: processing melodic voices (with a focus on singing voices) using computational methods.  ...  voice analysis and synthesis, and performance analysis (timbre, intonation, expression).  ...  , and singing voice synthesis.  ... 
doi:10.4230/dagrep.9.1.125 dblp:journals/dagstuhl-reports/MullerGY19 fatcat:w4slm5nxqrdlfaser5dtqar7s4

LyricSynchronizer: Automatic Synchronization System Between Musical Audio Signals and Lyrics

Hiromasa Fujihara, Masataka Goto, Jun Ogata, Hiroshi G. Okuno
2011 IEEE Journal on Selected Topics in Signal Processing  
Although methods for synchronizing monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques have been proposed, these methods cannot be applied to vocals in  ...  This paper describes a system that can automatically synchronize polyphonic musical audio signals with their corresponding lyrics.  ...  [3] used a speech recognizer for aligning a singing voice and Wang et al.  ... 
doi:10.1109/jstsp.2011.2159577 fatcat:o3zzphl6o5dzfllf7ujuckms7y

Singing Voice Detection: A Survey

Ramy Monir, Daniel Kostrzewa, Dariusz Mrozek
2022 Entropy  
This process is a crucial preprocessing step that can be used to improve the performance of other tasks such as automatic lyrics alignment, singing melody transcription, singing voice separation, vocal  ...  This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN.  ...  Data Availability Statement: The data is contained within the article. Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/e24010114 pmid:35052140 pmcid:PMC8775013 fatcat:nt3wnmf4e5anxiiinkpvuqxwfq

MULTIMODAL ANALYSIS: Informed content estimation and audio source separation [article]

Gabriel Meseguer-Brocal
2021 arXiv   pre-print
The singing voice directly connects the audio signal and the text information in a unique way, combining melody and lyrics where a linguistic dimension complements the abstraction of musical instruments  ...  Yet it scores almost as well as the S(T(J Train)), which was trained with better-aligned data (its teacher is the best one). We presume an error tolerance in the singing voice detection task.  ...  In contrast with the alignment tasks, in the singing voice detection we aim to know, from the audio signal analysis, the probability of having a singing voice or not.  ... 
arXiv:2104.13276v3 fatcat:wirjfj4iwjgfteejmeujydey7u

Emotional Speech Synthesis for a Radio DJ: Corpus Design and Expression Modeling

Martí Umbert, Jordi Bonada, Jordi Janer
2010 Zenodo  
These results are objectively compared to the training data as well as subjectively evaluated in terms of emotion activation and speech rate.  ...  This master thesis concerns the design of a corpus for speech synthesis as well as the modeling of different emotions in the context of a Radio DJ speaker.  ...  Figure 1 . 1 : 11 Thesis and internship main blocks Figure 1 1 and singing voice synthesis Different synthesis strategies are typically used in speech and singing voice synthesis depending on the approach  ... 
doi:10.5281/zenodo.3753080 fatcat:wioalcgc5zcsbchqf5ah3vouey

An Overview of Lead and Accompaniment Separation in Music

Zafar Rafii, Antoine Liutkus, Fabian-Robert Stoter, Stylianos Ioannis Mimilakis, Derry FitzGerald, Bryan Pardo
2018 IEEE/ACM Transactions on Audio Speech and Language Processing  
In conjunction with the above, a comprehensive list of references is provided, along with relevant pointers to available implementations and repositories.  ...  Filtering such mixtures to extract one or both components has many applications, such as automatic karaoke and remixing.  ...  This contrasts with the speech community which routinely generates mixtures by summing noise data [263] and clean speech [264] .  ... 
doi:10.1109/taslp.2018.2825440 fatcat:256vf4wogzfsrlzlsfxda44gri
« Previous Showing results 1 — 15 out of 125 results