A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Towards Automatic Mispronunciation Detection In Singing
2017
Zenodo
Attribution: Chitralekha Gupta, David Grunberg, Preeti Rao, Ye Wang. ...
In future, we would explore a combination of data-driven methods such as in [27] and our knowledge-based methods to improve the mispronunciation detection accuracy. c Chitralekha Gupta, David Grunberg ...
doi:10.5281/zenodo.1418072
fatcat:zai7jlfhojdqzffvoi2wzp32yq
Semi-supervised Lyrics and Solo-singing Alignment
2018
Zenodo
Attribution: Chitralekha Gupta, Rong Tong, Haizhou, Ye Wang. ...
Gupta, Rong Tong, Haizhou, Ye Wang. ...
doi:10.5281/zenodo.1492487
fatcat:e2kbncbklze3djdz44x4qdxqam
Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks
[article]
2021
arXiv
pre-print
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude and instantaneous frequency (the "IFSpectrogram"). Many other synthesis systems use representations derived from the magnitude spectra, and then depend on a backend component to invert the output magnitude spectrograms that generally result in audible artefacts associated with the inversion
arXiv:2103.07390v1
fatcat:sqp2p2gsazgfnjrmnxt5etccdi
more »
... rocess. However, for signals that have closely-spaced frequency components such as non-pitched and other noisy sounds, training the GAN on the 2-channel IFSpectrogram representation offers no advantage over the magnitude spectra based representations. In this paper, we propose that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds. We show that this method produces higher-quality output for wideband and noisy sounds, such as pops and chirps, compared to using the IFSpectrogram. Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.
Intelligibility Of Sung Lyrics: A Pilot Study
2017
Zenodo
Ibrahim, David Grunberg, Kat Agres, Chitralekha Gupta, Ye Wang. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Karim M. ...
Ibrahim, David Grunberg, Kat Agres, Chitralekha Gupta, Ye Wang. " Intelligibility of Sung Lyrics: a Pilot Study", 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017 ...
doi:10.5281/zenodo.1414729
fatcat:iifjur6jmzfljjzec277fct6hi
Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks
2021
Zenodo
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude and instantaneous frequency (the "IFSpectrogram"). Many other synthesis systems use representations derived from the magnitude spectra, and then depend on a backend component to invert the output magnitude spectrograms that generally result in audible artefacts associated with the inversion
doi:10.5281/zenodo.5040541
fatcat:3bnvjhjp2vhydg7dxsayhejohi
more »
... rocess. However, for signals that have closely-spaced frequency components such as non-pitched and other noisy sounds, training the GAN on the 2-channel IFSpectrogram representation offers no advantage over the magnitude spectra based representations. In this paper, we propose that training GANs on single-channel magnitude spectra, and using the Phase Gradient Heap Integration (PGHI) inversion algorithm is a better comprehensive approach for audio synthesis modeling of diverse signals that include pitched, non-pitched, and dynamically complex sounds. We show that this method produces higher-quality output for wideband and noisy sounds, such as pops and chirps, compared to using the IFSpectrogram. Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.
Automatic Pronunciation Evaluation of Singing
2018
Interspeech 2018
In this work, we develop a strategy to automatically evaluate pronunciation of singing. We apply singing-adapted automatic speech recognizer (ASR) in a two-stage approach for evaluating pronunciation of singing. First, we force-align the lyrics with the sung utterances to obtain the word boundaries. We improve the word boundaries by a novel lexical modification technique. Second, we investigate the performance of the phonetic posteriorgram (PPG) based template independent and dependent methods
doi:10.21437/interspeech.2018-1267
dblp:conf/interspeech/GuptaLW18
fatcat:v3nk2w2hebbrhjec2gjw3xfaxa
more »
... or scoring the aligned words. To validate the evaluation scheme, we obtain reliable human pronunciation evaluation scores using a crowd-sourcing platform. We show that the automatic evaluation scheme offers quality scores that are close to human judgments.
Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
[article]
2022
arXiv
pre-print
Gupta et al. ...
[10] adopted an end-toend wave-U-net model to predict character probabilities from the polyphonic audio, while Gupta et al. ...
arXiv:2204.03307v1
fatcat:rf3emyqtfjeuvkdnr24u37eiwe
Empirically Weighting the Importance of Decision Factors for Singing Preference
2018
Zenodo
Ibrahim, Chitralekha Gupta, Ye Wang. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Michael Mustaine, Karim M. Ibrahim, Chitralekha Gupta, Ye Wang. ...
doi:10.5281/zenodo.1492468
fatcat:cwvtkyubyrh6zkpe2j6ilx5br4
Automatic rank-ordering of singing vocals with twin-neural network
2020
Zenodo
Gupta et al. [11] Table 4 . The performance of twin-net and hybrid twin-net models on unseen songs from test dataset 2. ...
Comparison with Prior Studies The prior studies that are closest to this work are the ones by Gupta et al. [11] and Pati et al. [25] . ...
doi:10.5281/zenodo.4245458
fatcat:v37bxueymfenjl3haw2zhajbmy
Objective Assessment of Ornamentation in Indian Classical Singing
[chapter]
2012
Lecture Notes in Computer Science
Important aspects of singing ability include musical accuracy and voice quality. In the context of Indian classical music, not only is the correct sequence of notes important to musical accuracy but also the nature of pitch transitions between notes. These transitions are essentially related to gamakas (ornaments) that are important to the aesthetics of the genre. Thus a higher level of singing skill involves achieving the necessary expressiveness via correct rendering of ornamentation, and
doi:10.1007/978-3-642-31980-8_1
fatcat:opuhsyt2wbervjma3giej52kte
more »
... ability can serve to distinguish a welltrained singer from an amateur. We explore objective methods to assess the quality of ornamentation rendered by a singer with reference to a model rendition of the same song. Methods are proposed for the perceptually relevant comparison of complex pitch movements based on cognitively salient features of the pitch contour shape. The objective measurements are validated via their observed correlation with subjective ratings by human experts. Such an objective assessment system can serve as a useful feedback tool in the training of amateur singers.
Context-Aware Features for Singing Voice Detection in Polyphonic Music
[chapter]
2013
Lecture Notes in Computer Science
The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the
doi:10.1007/978-3-642-37425-8_4
fatcat:3c6yqg2amrdcdbmumgrn4czc4i
more »
... dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.
A technical framework for automatic perceptual evaluation of singing quality
2018
APSIPA Transactions on Signal and Information Processing
Ten singers sang the song 'I have a dream' by ABBA chitralekha gupta, haizhou li and ye wang (∼2 min), and the other ten sang 'Edelweiss' from the movie 'The Sound of Music' . ...
IP address: 207.241.231.81, on 01 May 2019 at 09:07:47, subject to the Cambridge Core terms of use, available at chitralekha gupta, haizhou li and ye wang
of Natural Language Processing (2017-2018). ...
doi:10.1017/atsip.2018.10
fatcat:hugvsjscgnam7gnfktmfbhk7ra
Evaluating vowel pronunciation quality: Formant space matching versus ASR confidence scoring
2010
2010 National Conference On Communications (NCC)
Quantitative evaluation of the quality of a speaker's pronunciation of the vowels of a language can contribute to the important task of speaker accent detection. Our aim is to qualitatively and quantitatively distinguish between native and non-native speakers of a language on the basis of a comparative study of two analysis methods. One deals with relative positions of their vowels in formant (F1-F2) space that conveys important articulatory information. The other method exploits the
doi:10.1109/ncc.2010.5430187
fatcat:zgxt3b3shfe57b737354zllotq
more »
... of trained phone models to accent variations, as captured by the log likelihood scores, to distinguish between native and non-native speakers.
Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?
[article]
2019
arXiv
pre-print
music affects lyrics intelligibility of singing vocals in a music piece. Automatic lyrics alignment and transcription in polyphonic music are challenging tasks because the singing vocals are corrupted by the background music. In this work, we propose to learn music genre-specific characteristics to train polyphonic acoustic models. We first compare several automatic speech recognition pipelines for the application of lyrics transcription. We then present the lyrics alignment and transcription
arXiv:1909.10200v2
fatcat:6sc6dywp6jcvzhf35pyyluly3a
more »
... rformance of music-informed acoustic models for the best-performing pipeline, and systematically study the impact of music genre and language model on the performance. With such genre-based approach, we explicitly model the music without removing it during acoustic modeling. The proposed approach outperforms all competing systems in the lyrics alignment and transcription tasks on several well-known polyphonic test datasets.
Seccima: Singing And Ear Training For Children With Cochlear Implants Via A Mobile Application
2017
Proceedings of the SMC Conferences
Copyright: c 2017 Zhiyan Duan, Chitralekha Gupta, Graham Percival, David Grunberg and Ye Wang et al. ...
doi:10.5281/zenodo.1401915
fatcat:u5nt5nnmfrffrgiuonyz3f2rdm
« Previous
Showing results 1 — 15 out of 81 results