Filters








160 Hits in 4.6 sec

Monaural Score-Informed Source Separation For Classical Music Using Convolutional Neural Networks

Marius Miron, Jordi Janer, Emilia Gómez
2017 Zenodo  
In this paper we introduce a monaural score-informed source separation framework for Western classical music using convolutional neural networks (CNN).  ...  "Monaural score-informed source separation for classical music using convolutional neural networks", 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.  ... 
doi:10.5281/zenodo.1416498 fatcat:ik4enccj6netzde76dbfqhg62a

Generating Data To Train Convolutional Neural Networks For Classical Music Source Separation

Marius Miron, Jordi Janer, Emilia Gómez
2017 Proceedings of the SMC Conferences  
Acknowledgments The TITANX used for this research was donated by the Proceedings of the 14th Sound and Music Computing Conference, July 5-8, Espoo, Finland SMC2017-232  ...  We propose a timbre-informed and score-constrained system to train neural networks for monaural source separation of classical music mixtures.  ...  OUTLOOK We proposed a method to generate training data for timbreinformed source separation methods using neural networks, in the context of classical music.  ... 
doi:10.5281/zenodo.1401922 fatcat:z4bq6dksynhodglcsybhby34iu

End-to-End Sound Source Separation Conditioned On Instrument Labels [article]

Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez
2019 arXiv   pre-print
Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model?  ...  This approach leads to other types of conditioning such as audio-visual source separation and score-informed source separation.  ...  We would like to thank Terry Um and Eric Jang for their support during the camp, and Mar-  ... 
arXiv:1811.01850v2 fatcat:thwvytvuzbg7hchrrxu4tgayte

End-to-end Sound Source Separation Conditioned on Instrument Labels

Olga Slizovskaia, Leo Kim, Gloria Haro, Emilia Gomez
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Can we perform an end-to-end sound source separation (SSS) with a variable number of sources using a deep learning model?  ...  This paper presents an extension of the Wave-U-Net [1] model which allows end-to-end monaural source separation with a non-fixed number of sources.  ...  ACKNOWLEDGEMENTS We would like to thank Terry Um and Eric Jang for their support during the camp, and the participants of Jeju Deep Learning Camp 2018 for useful discussions.  ... 
doi:10.1109/icassp.2019.8683800 dblp:conf/icassp/SlizovskaiaKHG19 fatcat:ampbcvbyt5hvdm64bugonz2g3i

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning [article]

Ning Zhang, Junchi Yan, Yuchen Zhou
2018 arXiv   pre-print
Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source.  ...  We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning.  ...  (Huang et al. 2015) adopt a deep recurrent neural networks to separate monaural sources.  ... 
arXiv:1711.04121v3 fatcat:onszkgjofjfa3l5qsespl2f2mm

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning

Ning Zhang, Junchi Yan, Yuchen Zhou
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source.  ...  Separating audio mixtures into individual instrument tracks has been a standing challenge. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning.  ...  ., 2015] adopt a deep recurrent neural networks to separate monaural sources.  ... 
doi:10.24963/ijcai.2018/636 dblp:conf/ijcai/ZhangYZ18a fatcat:br6itktldzfclmjte4ktsfamqu

Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation [article]

Zhong-Qiu Wang and Gordon Wichern and Jonathan Le Roux
2021 arXiv   pre-print
A promising approach for speech dereverberation is based on supervised learning, where a deep neural network (DNN) is trained to predict the direct sound from noisy-reverberant speech.  ...  ., that reverberation results from a linear convolution between a room impulse response (RIR) and a dry source signal.  ...  Lu, “Complex Spectrogram quence Modeling for Time-Domain Single-Channel Speech Separation,” Enhancement by Convolutional Neural Network with Multi-Metrics in Proc.  ... 
arXiv:2108.07376v2 fatcat:3obh4vconnfqrmez5fzp37qc6y

The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging [article]

Keunwoo Choi and George Fazekas and Kyunghyun Cho and Mark Sandler
2017 arXiv   pre-print
Using a trained network, we compute label vector similarities which is compared to groundtruth similarity. The results highlight several important aspects of music tagging and neural networks.  ...  Deep neural networks (DNN) have been successfully applied to music classification including music tagging.  ...  ACKNOWLEDGEMENTS This work is supported by EPSRC project (EP/L019981/1) 'Fusing Semantic and Audio Technologies for Intelligent Music Production and Consumption' and the European Commission H2020 research  ... 
arXiv:1706.02361v3 fatcat:aio2wo22r5hkfmdjeykj2dymtq

Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation [article]

Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, Wenwu Wang
2020 arXiv   pre-print
Monaural Singing Voice Separation (MSVS) is a challenging task and has been studied for decades. Deep neural networks (DNNs) are the current state-of-the-art methods for MSVS.  ...  Specifically, we propose a new multi-resolution Convolutional Neural Network (CNN) framework for MSVS namely Multi-Resolution Pooling CNN (MRP-CNN), which uses various-size pooling operators to extract  ...  Monaural singing voice separation (MSVS), as an important research branch of music source separation (MSS), aims to separate the singing voice and the background music accompaniment from a single-channel  ... 
arXiv:2008.00816v1 fatcat:epkpzgcgtfeo7gwdgb5sotsqpm

Drum-Aware Ensemble Architecture for Improved Joint Musical Beat and Downbeat Tracking [article]

Ching-Yu Chiu, Alvin Wen-Yu Su, Yi-Hsuan Yang
2021 arXiv   pre-print
This paper presents a novel system architecture that integrates blind source separation with joint beat and downbeat tracking in musical audio signals.  ...  The source separation module segregates the percussive and non-percussive components of the input signal, over which beat and downbeat tracking are performed separately and then the results are aggregated  ...  Research on blind monaural source separation, which concerns with segregating the sound sources involved in a monaural audio recording, has also seen remarkable progress in recent years thanks to deep  ... 
arXiv:2106.08685v1 fatcat:q7j7cl3vkbeo5fnyttqrogodi4

Monoaural Audio Source Separation Using Variational Autoencoders

Laxmi Pandey, Anurendra Kumar, Vinay Namboodiri
2018 Interspeech 2018  
Traditionally, discriminative training for source separation is proposed using deep neural networks or non-negative matrix factorization.  ...  We introduce a monaural audio source separation framework using a latent generative model.  ...  Source separation is a classic problem and has wide applications in automatic speech recognition, biomedical imaging, and music editing.  ... 
doi:10.21437/interspeech.2018-1140 dblp:conf/interspeech/PandeyKN18 fatcat:5js7izrmcjdvbcoxo5px7o2x5i

Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks

Michael Taenzer, Stylianos I. Mimilakis, Jakob Abeßer
2021 Electronics  
To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony  ...  In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings.  ...  Furthermore, LPE could be useful in training supervised music source separation approaches by introducing constraints, depending on the degree of polyphony, and thereby reinforcing neural networks for  ... 
doi:10.3390/electronics10070851 fatcat:y5bf3eamw5cvdewohgtosdofj4

Self-Supervised Generation of Spatial Audio for 360 Video [article]

Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang
2018 arXiv   pre-print
Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360 video frames  ...  Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 video and a mono audio track.  ...  For example, [18] proposes a recurrent neural-network for monaural separation of two speakers, [1, 12, 11] seek to isolate sound sources by leveraging synchronized visual information in addition to  ... 
arXiv:1809.02587v1 fatcat:2at6tsjuujaede254ueasdw3fu

Spatial Audio Scene Characterization (SASC): Automatic Localization of Front-, Back-, Up-, and Down-Positioned Music Ensembles in Binaural Recordings

Sławomir K. Zieliński, Paweł Antoniuk, Hyunkook Lee
2022 Applied Sciences  
This paper demonstrates that the convolutional neural network (CNN) can be used to automatically localize music ensembles panned to the front, back, up, or down positions.  ...  The network was developed using the repository of the binaural excerpts obtained by the convolution of multi-track music recordings with the selected sets of head-related transfer functions (HRTFs).  ...  The VOICEBOX toolbox [40] was used in MATLAB to calculate the spectrograms. Convolutional Neural Network Network Topology The well-proven AlexNet topology [41] was adopted for this work.  ... 
doi:10.3390/app12031569 fatcat:ve4xpfmfhfcufd4onpsbtswfp4

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen
2019 IEEE Journal on Selected Topics in Signal Processing  
In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space.  ...  The proposed method uses separately the phase and magnitude component of the spectrogram calculated on each audio channel as the feature, thereby avoiding any method- and array-specific feature extraction  ...  Some of the classifiers include Gaussian mixture model (GMM)hidden Markov model (HMM) [27] , fully connected (FC) neural networks [28] , recurrent neural networks (RNN) [29] [30] [31] [32] , and convolutional  ... 
doi:10.1109/jstsp.2018.2885636 fatcat:rlips2i22ndv7mi4a4k726vhr4
« Previous Showing results 1 — 15 out of 160 results