Filters








40 Hits in 3.9 sec

Spectrogram Inpainting for Interactive Generation of Instrument Sounds

Théis Bazin, Gaëtan Hadjeres, Philippe Esling, Mikhail Malt
2020 Zenodo  
implement token-masked Transformers for the inpainting-based generation of these codemaps.  ...  In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds.  ...  We cast the generation of musical instrument sounds as an interactive, inpainting-based generative-modeling task, 2.  ... 
doi:10.5281/zenodo.4285406 fatcat:ixv5zadplfhr3jjlgn4uvy5exi

An Exploratory Study on Perceptual Spaces of the Singing Voice [article]

Brendan O'Connor, Simon Dixon, George Fazekas
2021 arXiv   pre-print
for regularisation in machine learning.  ...  This research provides insight into how the timbre space of singing changes under different conditions, highlights the subjectivity of perception between participants, and provides generalised timbre maps  ...  This research is funded by the EPSRC and AHRC Centre for Doctoral Training in Media and Arts Technology (EP/L01632X/1). 1 https://github.com/Trebolium/VoicePerception  ... 
arXiv:2111.08196v1 fatcat:wg3vr6hm55f4xodiydsazkp52e

Spectrogram Inpainting for Interactive Generation of Instrument Sounds

Théis Bazin, Gaëtan Hadjeres, Philippe Esling, Mikhail Malt
2021
implement token-masked Transformers for the inpainting-based generation of these codemaps.  ...  In this paper, we cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds.  ...  We cast the generation of musical instrument sounds as an interactive, inpainting-based generative-modeling task, 2.  ... 
doi:10.48550/arxiv.2104.07519 fatcat:jvhrb4vp2vdhpmu25bzxybtcga

GACELA – A generative adversarial context encoder for long audio inpainting [article]

Andres Marafioti, Piotr Majdak, Nicki Holighaus, Nathanaël Perraudin
2020 arXiv   pre-print
This addresses the inherent multi-modality of audio inpainting at such long gaps and provides the option of user-defined inpainting.  ...  We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap  ...  For this, we wanted to test the system on a broader scenario including a more general definition of music. On this level, the added complexity is the interaction between several real instruments.  ... 
arXiv:2005.05032v1 fatcat:lbh6yrujefb6fm6dunqss3azhy

Vision-Infused Deep Audio Inpainting [article]

Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
2019 arXiv   pre-print
We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios.  ...  Multi-modality perception is essential to develop interactive intelligence.  ...  This work is supported in part by SenseTime Group Limited, and in part by the General Research Fund through the Research Grants Council of Hong Kong under Grants CUHK14202217, CUHK14203118, CUHK14205615  ... 
arXiv:1910.10997v1 fatcat:b7uwo3sx2rd6jafagi3ldk7x7q

Audio inpainting of music by means of neural networks [article]

Andrés Marafioti, Nicki Holighaus, Piotr Majdak, Nathanaël Perraudin
2022 arXiv   pre-print
For music, our DNN significantly outperformed the reference method, demonstrating a generally good usability of the proposed DNN structure for inpainting complex audio signals like music.  ...  We focused on gaps in the range of tens of milliseconds. The proposed DNN structure was trained on audio signals containing music and musical instruments, separately, with 64-ms long gaps.  ...  While LPC may sound antiquated, it is particularly suitable for the instrument sounds as it models the way the sound is created by many instruments, i.e., by means of weighted sum of resonances.  ... 
arXiv:1810.12138v3 fatcat:ixn4uxfy35haxbkumpjunmgcha

Audio inpainting with generative adversarial network [article]

P. P. Ebner, A. Eltelt
2020 arXiv   pre-print
We study the ability of Wasserstein Generative Adversarial Network (WGAN) to generate missing audio content which is, in context, (statistically similar) to the sound and the neighboring borders.  ...  Further, we got better results for instruments where the frequency spectrum is mainly in the lower range where small noises are less annoying for human ear and the inpainting part is more perceptible.  ...  The authors would like to thank SDSC for their support and in particularly Nathanal Perraudin for his constructive and insightful inputs and for helping to manage the project.  ... 
arXiv:2003.07704v1 fatcat:mkfmyfildjcxzhzsosibshbtgm

CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis

Simon Rouard, Gaëtan Hadjeres
2021 Zenodo  
We motivate novel heuristics for the choice of the diffusion processes better suited for audio generation, and consider the use of a conditional U-Net to approximate the score function.  ...  Through extensive experiments, we showcase on a drum sound generation task the numerous sampling schemes offered by our method (unconditional generation, deterministic generation, inpainting, interpolation  ...  In [11] , the authors use a VQ-VAE2 [12] in order to perform inpainting on instrument sound spectrograms.  ... 
doi:10.5281/zenodo.5624402 fatcat:rvbkaqcf5ra2lioljyay5isfpq

A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions [article]

Shulei Ji, Jing Luo, Xinyu Yang
2020 arXiv   pre-print
levels of music generation: score generation produces scores, performance generation adds performance characteristics to the scores, and audio generation converts scores with performance characteristics  ...  This paper attempts to provide an overview of various composition tasks under different music generation levels, covering most of the currently popular music generation tasks using deep learning.  ...  [285] proposed an interactive music generation interface named NONOTO based on the inpainting model.  ... 
arXiv:2011.06801v1 fatcat:cixou3d2jzertlcpb7kb5x5ery

CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound Synthesis [article]

Simon Rouard, Gaëtan Hadjeres
2021 arXiv   pre-print
We motivate novel heuristics for the choice of the diffusion processes better suited for audio generation, and consider the use of a conditional U-Net to approximate the score function.  ...  Through extensive experiments, we showcase on a drum sound generation task the numerous sampling schemes offered by our method (unconditional generation, deterministic generation, inpainting, interpolation  ...  In Bazin et al. [2021] , the authors use a VQ-VAE2 Razavi et al. [2019] in order to perform inpainting on instrument sound spectrograms.  ... 
arXiv:2106.07431v1 fatcat:lyekau45jveerbpfjjfnaw2d4m

Foley Music: Learning to Generate Music from Videos [chapter]

Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba
2020 Lecture Notes in Computer Science  
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.  ...  We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.  ...  This work is supported by ONR MURI N00014-16-1-2007, the Center for Brain, Minds, and Machines (CBMM, NSF STC award CCF-1231216), and IBM Research.  ... 
doi:10.1007/978-3-030-58621-8_44 fatcat:7rcvic77mjbkxmrmx4r6vgvw3i

Foley Music: Learning to Generate Music from Videos [article]

Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba
2020 arXiv   pre-print
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.  ...  We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.  ...  This work is supported by ONR MURI N00014-16-1-2007, the Center for Brain, Minds, and Machines (CBMM, NSF STC award CCF-1231216), and IBM Research.  ... 
arXiv:2007.10984v1 fatcat:a5ktcxsufnftvdtqb7j4rmnc44

An Overview of Lead and Accompaniment Separation in Music [article]

Zafar Rafii and Antoine Liutkus and Fabian-Robert Stöter and Stylianos Ioannis Mimilakis and Derry FitzGerald and Bryan Pardo
2018 arXiv   pre-print
, musicology or sound engineering.  ...  For data-centered approaches, we discuss the particular difficulty of obtaining data for learning lead separation systems, and then review recent approaches, notably those based on deep learning.  ...  Therefore, manipulation of individual sound objects requires separation of the stereo audio mixture into several tracks, one for each different sound sources.  ... 
arXiv:1804.08300v1 fatcat:yd2lfvrr3zgotaixhhsnby223u

An Overview of Lead and Accompaniment Separation in Music

Zafar Rafii, Antoine Liutkus, Fabian-Robert Stoter, Stylianos Ioannis Mimilakis, Derry FitzGerald, Bryan Pardo
2018 IEEE/ACM Transactions on Audio Speech and Language Processing  
, musicology or sound engineering.  ...  For data-centered approaches, we discuss the particular difficulty of obtaining data for learning lead separation systems, and then review recent approaches, notably those based on deep learning.  ...  Therefore, manipulation of individual sound objects requires separation of the stereo audio mixture into several tracks, one for each different sound sources.  ... 
doi:10.1109/taslp.2018.2825440 fatcat:256vf4wogzfsrlzlsfxda44gri

Randomly weighted CNNs for (music) audio classification [article]

Jordi Pons, Xavier Serra
2019 arXiv   pre-print
By following this methodology, we run a comprehensive evaluation of the current deep architectures for audio classification, and provide evidence that the architectures alone are an important piece for  ...  We use features extracted from the embeddings of deep architectures as input to a classifier - with the goal to compare classification accuracies when using different randomly weighted architectures.  ...  [51] also showed that the structure of a network (the non-trained architecture) is sufficient to capture useful features for the tasks of image denoising, superresolution and inpainting.  ... 
arXiv:1805.00237v3 fatcat:jtywafx4fffz7cylvu7ceifuca
« Previous Showing results 1 — 15 out of 40 results