Filters








81 Hits in 0.72 sec

Towards Automatic Mispronunciation Detection In Singing

Chitralekha Gupta, David Grunberg, Preeti Rao, Ye Wang
2017 Zenodo  
Attribution: Chitralekha Gupta, David Grunberg, Preeti Rao, Ye Wang.  ...  In future, we would explore a combination of data-driven methods such as in [27] and our knowledge-based methods to improve the mispronunciation detection accuracy. c Chitralekha Gupta, David Grunberg  ... 
doi:10.5281/zenodo.1418072 fatcat:zai7jlfhojdqzffvoi2wzp32yq

Semi-supervised Lyrics and Solo-singing Alignment

Chitralekha Gupta, Rong Tong, Haizhou Li, Ye Wang
2018 Zenodo  
Attribution: Chitralekha Gupta, Rong Tong, Haizhou, Ye Wang.  ...  Gupta, Rong Tong, Haizhou, Ye Wang.  ... 
doi:10.5281/zenodo.1492487 fatcat:e2kbncbklze3djdz44x4qdxqam

Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks [article]

Chitralekha Gupta, Purnima Kamath, Lonce Wyse
2021 arXiv   pre-print
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude and instantaneous frequency (the "IFSpectrogram"). Many other synthesis systems use representations derived from the magnitude spectra, and then depend on a backend component to invert the output magnitude spectrograms that generally result in audible artefacts associated with the inversion
more » ... rocess. However, for signals that have closely-spaced frequency components such as non-pitched and other noisy sounds, training the GAN on the 2-channel IFSpectrogram representation offers no advantage over the magnitude spectra based representations. In this paper, we propose that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds. We show that this method produces higher-quality output for wideband and noisy sounds, such as pops and chirps, compared to using the IFSpectrogram. Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.
arXiv:2103.07390v1 fatcat:sqp2p2gsazgfnjrmnxt5etccdi

Intelligibility Of Sung Lyrics: A Pilot Study

Karim M. Ibrahim, David Grunberg, Kat Agres, Chitralekha Gupta, Ye Wang
2017 Zenodo  
Ibrahim, David Grunberg, Kat Agres, Chitralekha Gupta, Ye Wang. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Karim M.  ...  Ibrahim, David Grunberg, Kat Agres, Chitralekha Gupta, Ye Wang. " Intelligibility of Sung Lyrics: a Pilot Study", 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017  ... 
doi:10.5281/zenodo.1414729 fatcat:iifjur6jmzfljjzec277fct6hi

Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks

Chitralekha Gupta, Purnima Kamath, Lonce Wyse
2021 Zenodo  
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude and instantaneous frequency (the "IFSpectrogram"). Many other synthesis systems use representations derived from the magnitude spectra, and then depend on a backend component to invert the output magnitude spectrograms that generally result in audible artefacts associated with the inversion
more » ... rocess. However, for signals that have closely-spaced frequency components such as non-pitched and other noisy sounds, training the GAN on the 2-channel IFSpectrogram representation offers no advantage over the magnitude spectra based representations. In this paper, we propose that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds. We show that this method produces higher-quality output for wideband and noisy sounds, such as pops and chirps, compared to using the IFSpectrogram. Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.
doi:10.5281/zenodo.5040541 fatcat:3bnvjhjp2vhydg7dxsayhejohi

Automatic Pronunciation Evaluation of Singing

Chitralekha Gupta, Haizhou Li, Ye Wang
2018 Interspeech 2018  
In this work, we develop a strategy to automatically evaluate pronunciation of singing. We apply singing-adapted automatic speech recognizer (ASR) in a two-stage approach for evaluating pronunciation of singing. First, we force-align the lyrics with the sung utterances to obtain the word boundaries. We improve the word boundaries by a novel lexical modification technique. Second, we investigate the performance of the phonetic posteriorgram (PPG) based template independent and dependent methods
more » ... or scoring the aligned words. To validate the evaluation scheme, we obtain reliable human pronunciation evaluation scores using a crowd-sourcing platform. We show that the automatic evaluation scheme offers quality scores that are close to human judgments.
doi:10.21437/interspeech.2018-1267 dblp:conf/interspeech/GuptaLW18 fatcat:v3nk2w2hebbrhjec2gjw3xfaxa

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music [article]

Xiaoxue Gao, Chitralekha Gupta, Haizhou Li
2022 arXiv   pre-print
Gupta et al.  ...  [10] adopted an end-toend wave-U-net model to predict character probabilities from the polyphonic audio, while Gupta et al.  ... 
arXiv:2204.03307v1 fatcat:rf3emyqtfjeuvkdnr24u37eiwe

Empirically Weighting the Importance of Decision Factors for Singing Preference

Michael Barone, Karim Ibrahim, Chitralekha Gupta, Ye Wang
2018 Zenodo  
Ibrahim, Chitralekha Gupta, Ye Wang. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Michael Mustaine, Karim M. Ibrahim, Chitralekha Gupta, Ye Wang.  ... 
doi:10.5281/zenodo.1492468 fatcat:cwvtkyubyrh6zkpe2j6ilx5br4

Automatic rank-ordering of singing vocals with twin-neural network

Chitralekha Gupta, Lin Huang, Haizhou Li
2020 Zenodo  
Gupta et al. [11] Table 4 . The performance of twin-net and hybrid twin-net models on unseen songs from test dataset 2.  ...  Comparison with Prior Studies The prior studies that are closest to this work are the ones by Gupta et al. [11] and Pati et al. [25] .  ... 
doi:10.5281/zenodo.4245458 fatcat:v37bxueymfenjl3haw2zhajbmy

Objective Assessment of Ornamentation in Indian Classical Singing [chapter]

Chitralekha Gupta, Preeti Rao
2012 Lecture Notes in Computer Science  
Important aspects of singing ability include musical accuracy and voice quality. In the context of Indian classical music, not only is the correct sequence of notes important to musical accuracy but also the nature of pitch transitions between notes. These transitions are essentially related to gamakas (ornaments) that are important to the aesthetics of the genre. Thus a higher level of singing skill involves achieving the necessary expressiveness via correct rendering of ornamentation, and
more » ... ability can serve to distinguish a welltrained singer from an amateur. We explore objective methods to assess the quality of ornamentation rendered by a singer with reference to a model rendition of the same song. Methods are proposed for the perceptually relevant comparison of complex pitch movements based on cognitively salient features of the pitch contour shape. The objective measurements are validated via their observed correlation with subjective ratings by human experts. Such an objective assessment system can serve as a useful feedback tool in the training of amateur singers.
doi:10.1007/978-3-642-31980-8_1 fatcat:opuhsyt2wbervjma3giej52kte

Context-Aware Features for Singing Voice Detection in Polyphonic Music [chapter]

Vishweshwara Rao, Chitralekha Gupta, Preeti Rao
2013 Lecture Notes in Computer Science  
The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the
more » ... dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.
doi:10.1007/978-3-642-37425-8_4 fatcat:3c6yqg2amrdcdbmumgrn4czc4i

A technical framework for automatic perceptual evaluation of singing quality

Chitralekha Gupta, Haizhou Li, Ye Wang
2018 APSIPA Transactions on Signal and Information Processing  
Ten singers sang the song 'I have a dream' by ABBA chitralekha gupta, haizhou li and ye wang (∼2 min), and the other ten sang 'Edelweiss' from the movie 'The Sound of Music' .  ...  IP address: 207.241.231.81, on 01 May 2019 at 09:07:47, subject to the Cambridge Core terms of use, available at chitralekha gupta, haizhou li and ye wang of Natural Language Processing (2017-2018).  ... 
doi:10.1017/atsip.2018.10 fatcat:hugvsjscgnam7gnfktmfbhk7ra

Evaluating vowel pronunciation quality: Formant space matching versus ASR confidence scoring

Ashish Patil, Chitralekha Gupta, Preeti Rao
2010 2010 National Conference On Communications (NCC)  
Quantitative evaluation of the quality of a speaker's pronunciation of the vowels of a language can contribute to the important task of speaker accent detection. Our aim is to qualitatively and quantitatively distinguish between native and non-native speakers of a language on the basis of a comparative study of two analysis methods. One deals with relative positions of their vowels in formant (F1-F2) space that conveys important articulatory information. The other method exploits the
more » ... of trained phone models to accent variations, as captured by the log likelihood scores, to distinguish between native and non-native speakers.
doi:10.1109/ncc.2010.5430187 fatcat:zgxt3b3shfe57b737354zllotq

Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help? [article]

Chitralekha Gupta, Emre Yılmaz, Haizhou Li
2019 arXiv   pre-print
music affects lyrics intelligibility of singing vocals in a music piece. Automatic lyrics alignment and transcription in polyphonic music are challenging tasks because the singing vocals are corrupted by the background music. In this work, we propose to learn music genre-specific characteristics to train polyphonic acoustic models. We first compare several automatic speech recognition pipelines for the application of lyrics transcription. We then present the lyrics alignment and transcription
more » ... rformance of music-informed acoustic models for the best-performing pipeline, and systematically study the impact of music genre and language model on the performance. With such genre-based approach, we explicitly model the music without removing it during acoustic modeling. The proposed approach outperforms all competing systems in the lyrics alignment and transcription tasks on several well-known polyphonic test datasets.
arXiv:1909.10200v2 fatcat:6sc6dywp6jcvzhf35pyyluly3a

Seccima: Singing And Ear Training For Children With Cochlear Implants Via A Mobile Application

Zhiyan Duan, Chitralekha Gupta, Graham Percival, David Grunberg, Ye Wang
2017 Proceedings of the SMC Conferences  
Copyright: c 2017 Zhiyan Duan, Chitralekha Gupta, Graham Percival, David Grunberg and Ye Wang et al.  ... 
doi:10.5281/zenodo.1401915 fatcat:u5nt5nnmfrffrgiuonyz3f2rdm
« Previous Showing results 1 — 15 out of 81 results