Filters








830 Hits in 4.5 sec

Unsupervised Cross-Domain Singing Voice Conversion

Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman
2020 Interspeech 2020  
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.  ...  The proposed generative architecture is invariant to the speaker's identity and can be trained to generate target singers from unlabeled training data, using either speech or singing sources.  ...  The method of Unsupervised Singing Voice Conversion [27] learned to convert between a fixed set of singers without relying on a parallel-dataset.  ... 
doi:10.21437/interspeech.2020-1862 dblp:conf/interspeech/PolyakWAT20 fatcat:xcnchkemkreorihgfysy2wdh6i

Unsupervised Cross-Domain Singing Voice Conversion [article]

Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman
2020 arXiv   pre-print
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.  ...  The proposed generative architecture is invariant to the speaker's identity and can be trained to generate target singers from unlabeled training data, using either speech or singing sources.  ...  The method of Unsupervised Singing Voice Conversion [27] learned to convert between a fixed set of singers without relying on a parallel-dataset.  ... 
arXiv:2008.02830v1 fatcat:gmiltidfofe2bdqqxkjwtv4hhe

Unsupervised Singing Voice Conversion [article]

Eliya Nachmani, Lior Wolf
2019 arXiv   pre-print
We present a deep learning method for singing voice conversion.  ...  Our evaluation presents evidence that the conversion produces natural signing voices that are highly recognizable as the target singer.  ...  Figure 1 : 1 The schematic architecture of our singing voice conversion network. We employ an encoder E, a domain confusion network C and a conditional decoder D.  ... 
arXiv:1904.06590v3 fatcat:ovugbbgb2vcw5c5owpzqr2owvy

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion [article]

Yinghao Aaron Li, Ali Zare, Nima Mesgarani
2021 arXiv   pre-print
Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion.  ...  We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.  ...  Acknowledgements We would like to acknowledge Ryo Kato for proposing Star-GAN v2 for voice conversion and funding is from the National Institute of Health, NIDCD.  ... 
arXiv:2107.10394v2 fatcat:57uzvbnqgjab3e6y2ion3g2jqq

Unsupervised Singing Voice Conversion

Eliya Nachmani, Lior Wolf
2019 Interspeech 2019  
We present a deep learning method for singing voice conversion.  ...  Our evaluation presents evidence that the conversion produces natural signing voices that are highly recognizable as the target singer.  ...  Figure 1 : 1 Figure 1: The schematic architecture of our singing voice conversion network. We employ an encoder E, a domain confusion network C and a conditional decoder D.  ... 
doi:10.21437/interspeech.2019-1761 dblp:conf/interspeech/NachmaniW19 fatcat:cf3y5kgr2ja7tjoyrucmptcury

PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network [article]

Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
2020 arXiv   pre-print
Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1].  ...  Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content.  ...  CONCLUSION In this paper, a novel unsupervised singing voice conversion method named PitchNet is proposed.  ... 
arXiv:1912.01852v2 fatcat:fpkpq62n2fhmdgvz2rxxzxpjrm

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation [article]

Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng
2021 arXiv   pre-print
This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs.  ...  Moreover, the proposed FastSVC system achieves desirable cross-lingual singing conversion performance.  ...  Moreover, UCD-SVC can conduct cross-domain training, i.e., the model can be trained using either speech or singing datasets.  ... 
arXiv:2011.05731v2 fatcat:pbndcweufvbedh2s7pqwqwwyja

ASMMC21: The 6th International Workshop on Affective Social Multimedia Computing

Dongyan Huang, Björn Schuller, Jianhua Tao, Lei Xie, Jie Yang
2021 Proceedings of the 2021 International Conference on Multimodal Interaction  
clean singing voice from lyrics for target speaker.  ...  In contrast, there is plenty of singing voice data can be found on Internet.  ... 
doi:10.1145/3462244.3480980 fatcat:6mkbgunpgbg6va6j4eucvqf7ui

Zero-shot singing voice conversion

Shahan Nercessian
2020 Zenodo  
We illustrate the effectiveness of the proposed zero-shot singing voice conversion algorithms by both qualitative and quantitative means.  ...  In this paper, we propose the use of speaker embedding networks to perform zero-shot singing voice conversion, and suggest two architectures for its realization.  ...  INTRODUCTION Singing voice conversion (SVC) is the transformation of a singing performance from one vocalist to that of another.  ... 
doi:10.5281/zenodo.4245369 fatcat:3dbed2rgyzbrrawdnx6z4jng5i

Cross-Lingual Voice Conversion with Non-Parallel Data

Pablo Alonso-Jiménez
2017 Zenodo  
In this project a Phonetic Posteriorgram (PPG) based Voice Conversion system is implemented. The main goal is to perform and evaluate conversions of singing voice.  ...  The cross-gender and cross-lingual scenarios are considered.  ...  Same gender and same language voice conversion. 2. Cross gender and same language voice conversion. 3. Same language and cross-lingual voice conversion. 4.  ... 
doi:10.5281/zenodo.1117153 fatcat:prwivervc5dijhlzyhowmyy22e

Estimation of Fundamental Frequency from Singing Voice Using Harmonics of Impulse-like Excitation Source

Sudarsana Reddy Kadiri, Bayya Yegnanarayana
2018 Interspeech 2018  
From the recent studies on fundamental frequency estimation from singing voice with state-of-art methods proposed for speech, there exists a significant gap in accuracy for singing voice.  ...  This is mainly because of the wider and rapid variations in pitch in singing voice compared to that in speech.  ...  F0 is of particular interest in several speech/singing voice processing applications such as analysis, modification/conversion, recognition and synthesis.  ... 
doi:10.21437/interspeech.2018-2495 dblp:conf/interspeech/KadiriY18b fatcat:y7nyhwyfhze5til53hrf3cbneq

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control [article]

Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, Georgia Maniati, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris
2021 arXiv   pre-print
In this paper, a text-to-rapping/singing system is introduced, which can be adapted to any speaker's voice.  ...  Results show that the proposed approach can produce high quality rapping/singing voice with increased naturalness.  ...  Another GAN-based approach is unsupervised cross-domain singing voice conversion [12] , which uses additional perceptual losses on its generator output.  ... 
arXiv:2111.09146v1 fatcat:fonznraxrvcu7kvaxubkmpe35m

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Yu, D., +, TASLP 2020 852-861 Semantic Tagging of Singing Voices in Popular Music Recordings.  ...  ., +, TASLP 2020 402-415 Voice activity detection Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations.  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

Speech-to-Singing Conversion in an Encoder-Decoder Framework [article]

Jayneel Parekh, Preeti Rao, Yi-Hsuan Yang
2020 arXiv   pre-print
Given time-frequency representations of speech and a target melody contour, we learn encodings that enable us to synthesize singing that preserves the linguistic content and timbre of the speaker while  ...  This allows us to automatically model various aspects of this transformation, thus overcoming dependence on specific inputs such as high quality singing templates or phoneme-score synchronization information  ...  [4] propose a Cycle-GAN [5] based framework to perform singing voice conversion between any two singers and validate their results by performing gender transformation for singing voices.  ... 
arXiv:2002.06595v1 fatcat:losmt42anrhkfcc7znxbmdhdsq

Learning Domain-Adaptive Latent Representations of Music Signals Using Variational Autoencoders

Yin-Jyun Luo, Li Su
2018 Zenodo  
The experiments on cross-domain music alignment, namely an audioto-MIDI alignment, and a monophonic-to-polyphonic music alignment of singing voice show that the learned representations lead to better higher  ...  Furthermore, a preliminary experiment on singing voice source separation, by regarding the mixture and the voice as two distinct domains, also demonstrates the capability to solve music processing problems  ...  Task 3: Singing Voice Separation Singing voice separation is an essential yet notoriously challenging problem in music signal processing; the goal is to separate singing voice from music mixture.  ... 
doi:10.5281/zenodo.1492500 fatcat:tce334f26fdaddds5f2xn2th7q
« Previous Showing results 1 — 15 out of 830 results