Filters








2,831 Hits in 10.9 sec

Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision [article]

Yun-Ning Hung, Gordon Wichern, Jonathan Le Roux
2020 arXiv   pre-print
In contrast with previous score-informed separation approaches, our system does not require isolated sources, and score is used only as a training target, not required for inference.  ...  In this work, we use musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system.  ...  The model directly learns from musical scores, and only needs the music mixture during inference.  ... 
arXiv:2010.11904v1 fatcat:2uf7oh2marffhidue6fo6bdzwi

Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation

Kilian Schulze-Forster, Clement Samuel Joseph Doire, Gael Richard, Roland Badeau
2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
The goal of singing voice separation is to recover the vocals signal from music mixtures. State-of-the-art performance is achieved by deep neural networks trained in a supervised fashion.  ...  Since training data are scarce and music signals are extremely diverse, it remains challenging to achieve high separation quality across various recording and mixing conditions as well as music styles.  ...  We also thank Olumide Okubadejo and Sinead Namur for their help with transcribing the MUSDB lyrics.  ... 
doi:10.1109/taslp.2021.3091817 fatcat:rxhbieduxbhi5kccxej2b2cdlq

Data Cleansing with Contrastive Learning for Vocal Note Event Annotations [article]

Gabriel Meseguer-Brocal, Rachel Bittner, Simon Durand, Brian Brost
2021 arXiv   pre-print
Previously proposed data cleansing models do not consider structured (e.g. time varying) labels, such as those common to music data.  ...  We demonstrate that the accuracy of a transcription model improves greatly when trained using our proposed strategy compared with the accuracy when trained using the original dataset.  ...  Intuitively, you don't need to know the right answer to know if something is right or wrong.  ... 
arXiv:2008.02069v3 fatcat:q4qsvq43ajektmuu7s4xlurbeq

Data cleansing with contrastive learning for vocal note event annotations

Gabriel Meseguer Brocal, Rachel Bittner, Simon Durand, Brian Brost
2020 Zenodo  
consider structured (e.g. time varying) labels, such as those common to music data.We propose a novel data cleansing model for time-varying, structured labels which exploits the local structure of the  ...  local deformations of them.We demonstrate that the accuracy of a transcription model improves greatly when trained using our proposed data cleaning strategy compared with the accuracy when trained using  ...  Intuitively, you don't need to know the right answer to know if something is right or wrong.  ... 
doi:10.5281/zenodo.4245420 fatcat:flzmj47aj5dbpg6biqpsekchf4

"The way it Sounds": timbre models for analysis and retrieval of music signals

J.-J. Aucouturier, F. Pachet, M. Sandler
2005 IEEE transactions on multimedia  
An important attribute of a piece of polyphonic music is what is commonly referred to as "the way it sounds".  ...  Electronic Music Distribution is in need of robust and automatically extracted music descriptors.  ...  INTRODUCTION T HE exploding field of Electronic Music Distribution (EMD) is in need of powerful content-based management systems to help the end-users navigate huge music title catalogues, much as they  ... 
doi:10.1109/tmm.2005.858380 fatcat:ltg6rarwtreuxmiihogi2jtxky

MULTIMODAL ANALYSIS: Informed content estimation and audio source separation [article]

Gabriel Meseguer-Brocal
2021 arXiv   pre-print
Among the many text sources related to music that can be used (e.g. reviews, metadata, or social network feedback), we concentrate on lyrics.  ...  This dissertation proposes the study of multimodal learning in the context of musical signals. Throughout, we focus on the interaction between audio signals and text information.  ...  Additionally, some tasks such as singing-to-text and score transcription need a direct similarity measure of local elements to be solved.  ... 
arXiv:2104.13276v3 fatcat:wirjfj4iwjgfteejmeujydey7u

The Need for Web-Based Cognitive Behavior Therapy Among University Students

Ove K. Lintvedt, Kristian S⊘rensen, Andreas R. Østvik, Bas Verplanken, Catharina E. Wang
2008 Journal of technology in human services  
We would like you to rate whether you think these people are likely to be helpful for someone with depression.This next list is about medical treatments.  ...  In brief, individuals with high scores on the need for approval read specific information and then complete a specific exercise (I'll Not Cry Even if I Want to), those with high scores on the need to be  ...  In 'all or none' thinking a mistake or error is interpreted as a pattern of mistakes, and errors. The statement 'I'm a stupid idiot' is an example of labelling.  ... 
doi:10.1080/15228830802096705 fatcat:v3szblicnvek5ojlwvu2ehevim

D.2.2: Analysis of Market Needs

Paul Blakeman, Emma Clement, Kate Palmer
2019 Zenodo  
One element of the ICT4CART project is to explore the market for this infrastructure, starting with an investigation into what the various associated users may actually need from this infrastructure in  ...  This is only likely to increase as more and more of the driving task is automated and vehicle connectivity will move from an optional extra to a key enabler for ensuring that these increasingly automated  ...  The European countries rank highly due to their high scoring in the policy and legislation measures, as they are all demonstrating concentrated efforts to legislate the effective and safe testing of CAVs  ... 
doi:10.5281/zenodo.6396244 fatcat:4vftchei4fcoha7mbrnf3mvbqy

Artificial Musical Intelligence: A Survey [article]

Elad Liebman, Peter Stone
2020 arXiv   pre-print
its pursuit, with a particular emphasis on machine learning methods.  ...  While still nascent, several different approaches have been employed to tackle what may broadly be referred to as "musical intelligence."  ...  In that work, each song is represented as a mixture model of multivariate Gaussians, similar to a Gaussian Mixture Models (GMM).  ... 
arXiv:2006.10553v1 fatcat:2j6i27wrsfawpgcr2unxdgngd4

Audiovisual Analysis of Music Performances: Overview of an Emerging Field

Zhiyao Duan, Slim Essid, Cynthia C.S. Liem, Gael Richard, Gaurav Sharma
2019 IEEE Signal Processing Magazine  
separation, machine-learning methods for audio/music signals, music information retrieval, and multimodal audio processing.  ...  He is coauthor of more than 200 papers, His research interests are mainly in the field of speech and audio signal processing and include topics such as signal representations and signal models, source  ...  The common underlying idea is to improve audio-based transcription results with play/nonplay activity detection and fingering analysis. ■ Audio source separation: This is a task that can be significantly  ... 
doi:10.1109/msp.2018.2875511 fatcat:fdrryzbojvgp7bkaqwmmun4zhu

Music Data Mining [chapter]

Tao Li, Lei Li
2011 Chapman & Hall/CRC Data Mining and Knowledge Discovery Series  
To solve the problem of training data, they use a semi-supervised learning technique combined with score alignment.  ...  Semi-Supervised Learning: Semi-supervised learning is a type of machine learning techniques that makes use of both labeled and unlabeled data for training -typically a small amount of labeled data with  ...  The above issues are regarded as major challenges for the further evolution of music data mining technology. [Adli et al., 2010] Adli, A., Nakao, Z., and Nagata, Y. (2010) .  ... 
doi:10.1201/b11041-3 fatcat:y2etjljj6jdzrkq7dzikyk5kwq

Computational Methods for Melody and Voice Processing in Music Recordings (Dagstuhl Seminar 19052)

Meinard Müller, Emilia Gómez, Yi-Hsun Yang, Michael Wagner
2019 Dagstuhl Reports  
To cope with the increasing amount of digital music, one requires computational methods and tools that allow users to find, organize, analyze, and interact with music-topics that are central to the research  ...  This triggered interdisciplinary discussions that leveraged insights from fields as disparate as audio processing, machine learning, music perception, music theory, and information retrieval.  ...  The general goal of source separation is to decompose a complex sound mixture into its constituent components.  ... 
doi:10.4230/dagrep.9.1.125 dblp:journals/dagstuhl-reports/MullerGY19 fatcat:w4slm5nxqrdlfaser5dtqar7s4

Semi-Supervised Model Training for Unbounded Conversational Speech Recognition [article]

Shane Walker, Morten Pedersen, Iroro Orife, Jason Flaks
2017 arXiv   pre-print
For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models.  ...  Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with sufficient accuracy in the unbounded space of conversational speech.  ...  We define dataset construction and training as semi-supervised because we employ a seed model to transcribe a vast quantity of unlabeled audio, perform data selection on the new transcripts, retrain the  ... 
arXiv:1705.09724v1 fatcat:ppkfiyy3jrhzhibwgt5zby4h3m

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos [article]

Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny (+2 others)
2021 arXiv   pre-print
Current methods for learning visually grounded language from videos often rely on text annotation, such as human generated captions or machine generated automatic speech recognition (ASR) transcripts.  ...  To circumvent the need for text annotation, we learn audio-visual representations from randomly segmented video clips and their raw audio waveforms.  ...  Each task is defined as list of steps, such as "remove cap" and "spread mixture".  ... 
arXiv:2006.09199v2 fatcat:yg3i32cbdjcjjk4ajo2icdf4mm

Unsupervised Spoken Term Discovery on Untranscribed Speech [article]

Man-Ling Sung
2020 arXiv   pre-print
It is shown that the system learns the phonetic information of the language and can discover frequent spoken terms that align with text transcription.  ...  The discovered patterns can be grouped to determine the keywords of the audio. Multilingual neural network with bottleneck layer is used for feature extraction.  ...  Another approach is semi-supervised learning, in which a seed model is first trained with a small set of transcribed data to learn the hypothesis of the language, it will then decode the transcription  ... 
arXiv:2011.14060v1 fatcat:vqxrmzjq35codkddbza6fdd4a4
« Previous Showing results 1 — 15 out of 2,831 results