Filters








40 Hits in 2.0 sec

Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [chapter]

Pritish Chandna, Marius Miron, Jordi Janer, Emilia Gómez
2017 Lecture Notes in Computer Science  
In this paper we introduce a low-latency monaural source separation framework using a Convolutional Neural Network (CNN).  ...  We use a CNN to estimate time-frequency soft masks which are applied for source separation.  ...  Acknowledgments: The TITANX used for this research was donated by the NVIDIA Corporation.  ... 
doi:10.1007/978-3-319-53547-0_25 fatcat:k254yklw35f5phskmdu4avaniu

WildMix Dataset and Spectro-Temporal Transformer Model for Monoaural Audio Source Separation [article]

Amir Zadeh, Tianjun Ma, Soujanya Poria, Louis-Philippe Morency
2019 arXiv   pre-print
Monoaural audio source separation is a challenging research area in machine learning.  ...  In this paper, we first introduce a challenging new dataset for monoaural source separation called WildMix.  ...  DNN is a baseline that uses a fully connected deep neural network for separating the sources within a mixture (Grais, Sen, and Erdogan 2014) .  ... 
arXiv:1911.09783v1 fatcat:b2qboubulrd7rnklsxolhpmdma

Music Source Separation Using Stacked Hourglass Networks [article]

Sungheon Park and Taehoon Kim and Kyogu Lee and Nojun Kwak
2018 arXiv   pre-print
In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks.  ...  The proposed framework is able to separate multiple music sources using a single network.  ...  Monoaural audio source separation using deep Figure 2 . 2 Figure 2. Overall music source separation framework proposed in this paper.  ... 
arXiv:1805.08559v2 fatcat:rm3izf5p2rcabauhl7oowxspre

Music Source Separation Using Stacked Hourglass Networks

Sungheon Park, Taehoon Kim, Kyogu Lee, Nojun Kwak
2018 Zenodo  
In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks.  ...  The proposed framework is able to separate multiple music sources using a single network.  ...  Monoaural audio source separation using deep Figure 2 . 2 Figure 2. Overall music source separation framework proposed in this paper.  ... 
doi:10.5281/zenodo.1492404 fatcat:dosh3dtjdnforatu2rs6svwize

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning [article]

Ning Zhang, Junchi Yan, Yuchen Zhou
2018 arXiv   pre-print
We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning.  ...  Separating audio mixtures into individual instrument tracks has been a long standing challenging task.  ...  (Grais and Plumbley 2017) develop a deep convolutional denoising auto-encoder for monoaural audio source separation, which outperforms the deep feedforward neural networks.  ... 
arXiv:1711.04121v3 fatcat:onszkgjofjfa3l5qsespl2f2mm

On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training [article]

Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
2022 arXiv   pre-print
In this paper, we explore an improved framework to train a monoaural neural enhancement model for robust speech recognition.  ...  It is found that the unpaired clean speech is crucial to improve quality of separated speech from real noisy speech.  ...  Neural enhancement network The dense U-Net temporal convolutional network proposed by Wang et al. [2] is used for the monoaural speech enhancement.  ... 
arXiv:2205.01751v1 fatcat:ekpcd3bxxff23fvpvx2mym34fa

Monoaural Audio Source Separation Using Variational Autoencoders

Laxmi Pandey, Anurendra Kumar, Vinay Namboodiri
2018 Interspeech 2018  
Traditionally, discriminative training for source separation is proposed using deep neural networks or non-negative matrix factorization.  ...  We introduce a monaural audio source separation framework using a latent generative model.  ...  Recently, a deep (stacked) fully convolutional DAEs (CDAEs) is used for the audio single channel source separation (SCSS) [14] .  ... 
doi:10.21437/interspeech.2018-1140 dblp:conf/interspeech/PandeyKN18 fatcat:5js7izrmcjdvbcoxo5px7o2x5i

Investigating Kernel Shapes and Skip Connections for Deep Learning-Based Harmonic-Percussive Separation

Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlback
2019 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)  
In this paper we propose an efficient deep learning encoder-decoder network for performing Harmonic-Percussive Source Separation (HPSS).  ...  The training and evaluation of the separation has been done using the training and test sets of the MUSDB18 dataset.  ...  instrumental sources (drums, vocals, bass and other) is based on Deep Neural Networks (DNNs).  ... 
doi:10.1109/waspaa.2019.8937079 dblp:conf/waspaa/LordeloBDA19 fatcat:2ormcydurvcl3ohif56dgpvvda

Investigating kernel shapes and skip connections for deep learning-based harmonic-percussive separation [article]

Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck
2019 arXiv   pre-print
In this paper we propose an efficient deep learning encoder-decoder network for performing Harmonic-Percussive Source Separation (HPSS).  ...  The training and evaluation of the separation has been done using the training and test sets of the MUSDB18 dataset.  ...  instrumental sources (drums, vocals, bass and other) is based on Deep Neural Networks (DNNs).  ... 
arXiv:1905.01899v2 fatcat:p352pl5vzzderexkq5n6nu4tv4

Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning

Ning Zhang, Junchi Yan, Yuchen Zhou
2018 Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence  
Separating audio mixtures into individual instrument tracks has been a standing challenge. We introduce a novel weakly supervised audio source separation approach based on deep adversarial learning.  ...  Specifically, our loss function adopts the Wasserstein distance which directly measures the distribution distance between the separated sources and the real sources for each individual source.  ...  [Chandna et al., 2017] demonstrate the superiority of convolutional neural network (CNN) in monoaural audio source separation.  ... 
doi:10.24963/ijcai.2018/636 dblp:conf/ijcai/ZhangYZ18a fatcat:br6itktldzfclmjte4ktsfamqu

Quality Enhancement of Overdub Singing Voice Recordings

Benedikt Wimmer, Jordi Janer, Merlijn Blaauw
2021 Zenodo  
In this work, two neural network architectures for speech denoising – namely FullSubNet and Wave-U-Net – were trained and evalu-ated specifically on denoising of user singing voice recordings.  ...  Wave-U-Net As a derivative of 2D U-Net source separation architectures as in [29] , Wave-U-Net [21] adapts the concept of deep convolutional networks for time-domain processing of audio signals.  ...  In this, conventional approaches to AEC like adaptive filtering are being replaced by deep neural networks or improved with the use of artificial intelligence [17] .  ... 
doi:10.5281/zenodo.5553906 fatcat:elv437mgfvc63j6ktgxgdgiahq

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network [article]

Sharath Adavanne, Pasi Pertilä, Tuomas Virtanen
2017 arXiv   pre-print
We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages.  ...  This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection.  ...  nel audio source separation with deep neural networks,” in IEEE/ACM Transactions on Audio, Speech, and Lan- [21] “Detection and classification of acoustic scenes guage Processing, 2016.  ... 
arXiv:1706.02291v1 fatcat:wzwmiskifvdrddlp5e4tuxpnxm

SLNSpeech: solving extended speech separation problem by the help of sign language [article]

Jiasong Wu, Taotao Li, Youyong Kong, Guanyu Yang, Lotfi Senhadji, Huazhong Shu
2020 arXiv   pre-print
Specifically, we use 3D residual convolutional network to extract sign language features and use pretrained VGGNet model to exact visual features.  ...  Then, we design a general deep learning network for the self-supervised learning of three modalities, particularly, using sign language embeddings together with audio or audio-visual information for better  ...  [27] used convolutional neural network (CNN) to predict time-frequency masking for speech separation and Fu et al.  ... 
arXiv:2007.10629v1 fatcat:umggllkdofcjtc5ub6ugc2wpkq

Learning to Separate Object Sounds by Watching Unlabeled Video [article]

Ruohan Gao, Rogerio Feris, Kristen Grauman
2018 arXiv   pre-print
We show how the recovered disentangled bases can be used to guide audio source separation to obtain better-separated, object-level sounds.  ...  Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.  ...  In contrast, our goal is to separate multiple audio sources from a monoaural signal by leveraging learned audio-visual associations.  ... 
arXiv:1804.01665v2 fatcat:epqocpix6fhr5hwcamphkwaqsy

CNN-Based Acoustic Scene Classification System

Yerin Lee, Soyoung Lim, Il-Youp Kwak
2021 Electronics  
Second, using the same feature, depthwise separable convolution was applied to the Convolutional layer to develop a low-complexity model.  ...  One is that the audio recorded using different recording devices should be classified in general, and the other is that the model used should have low-complexity.  ...  The audio is provided in a binaural 48 kHz 24-bit format. Convolutional neural networks (CNNs) are deep neural networks that are commonly used for visual image analysis.  ... 
doi:10.3390/electronics10040371 fatcat:pgzg4pac4vgzngvmen4qmv6pkq
« Previous Showing results 1 — 15 out of 40 results