Filters








3,394 Hits in 5.3 sec

Neural Music Synthesis for Flexible Timbre Control [article]

Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello
2018 arXiv   pre-print
This paper describes a neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned on a learned instrument embedding followed by a WaveNet vocoder  ...  --- is modeled using generative neural networks.  ...  CONCLUSIONS AND FUTURE DIRECTIONS We showed that it is possible to build a music synthesis model by combining a recurrent neural network and FiLM conditioning layers, followed by a WaveNet vocoder.  ... 
arXiv:1811.00223v1 fatcat:z726pibdujeq3mwfdkqjdf7ko4

Deep Learning for Audio Signal Processing

Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schluter, Shuo-Yiin Chang, Tara N Sainath
2019 IEEE Journal on Selected Topics in Signal Processing  
architecture, as well as more audio-specific neural network models.  ...  ) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis).  ...  The basic CTC model was extended by Graves [42] to include a separate recurrent language model component, referred to as the recurrent neural network transducer (RNN-T).  ... 
doi:10.1109/jstsp.2019.2908700 fatcat:oy2qixj2dfe6hns7r7av6fw2wm

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices [article]

Marvin Coto-Jiménez, John Goddard-Close
2016 arXiv   pre-print
In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those  ...  HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint.  ...  , i is the input gate activation vector, f the forget gate activation function, o is the output gate activation function, and c the cell activation function.  ... 
arXiv:1602.02656v1 fatcat:nshkdywklfhbpcrssyqz6qyuq4

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis

Marvin Coto-Jiménez
2021 Biomimetics  
In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks.  ...  Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how  ...  Acknowledgments: This work was supported by the University of Costa Rica, project 322-B9-105. Conflicts of Interest: The author declare no conflict of interest.  ... 
doi:10.3390/biomimetics6010012 pmid:33562420 pmcid:PMC7985793 fatcat:2quimztvobf5pajggdcc4dv2te

F0 Modeling For Singing Voice Synthesizers with LSTM Recurrent Neural Networks

Serkan Özer, Merlijn Blaauw, Martí Umbert
2015 Zenodo  
Recurrent Neural Networks with Long Short Term Memory networks are employed for first time to this specific problem due to their flexibility and strong power in modeling complex sequences.  ...  Two recurrent neural networks are trained to learn baseline and vibrato parts of F0 separately. Then, F0 sequences are generated from the trained networks and applied to a singing voice synthesizer.  ...  In addition to using note Recurrent Neural Networks Applications Recurrent neural networks are very powerful sequence models.  ... 
doi:10.5281/zenodo.3755574 fatcat:44izjub7yjbivn7mrj6sf7h2ae

Improving the Learning Power of Artificial Intelligence Using Multimodal Deep Learning

Shchetinin Eugene Yu., Sevastianov Leonid
2021 EPJ Web of Conferences  
We explain the architecture of a bidirectional neural network model, its main advantages over regular neural networks and compare experimental results of BLSTM network with other models.  ...  The main advantage of this network architecture is that each module of the network consists of several interconnected layers, providing the ability to recognize flexible long-term dependencies in data,  ...  The model shown much better results, compared to onedirectional recurrent neural network, and slightly outperformed a one-directional LSTM network.  ... 
doi:10.1051/epjconf/202124801017 doaj:5d37721fffe947988fb2e00b7c846e64 fatcat:z7h3h2nnkvbo7kgzbryzienmz4

VaPar Synth – A Variational Parametric Model for Audio Synthesis [article]

Krishna Subramani, Preeti Rao, Alexandre D'Hooge
2020 arXiv   pre-print
In the interest of more flexible control over the generated sound, it could be more useful to work with a parametric representation of the signal which corresponds more directly to the musical attributes  ...  With the advent of data-driven statistical modeling and abundant computing power, researchers are turning increasingly to deep learning for audio synthesis.  ...  In the present work, rather than generating new timbres, we consider the problem of synthesis of a given instrument's sound with flexible control over the pitch.  ... 
arXiv:2004.00001v1 fatcat:rkq5hnofmze47eucdwbz73nt2m

Modeling of nonlinear audio effects with end-to-end deep neural networks [article]

Marco A. Martínez Ramirez, Joshua D. Reiss
2019 arXiv   pre-print
In this work, we investigate deep learning architectures for audio processing and we aim to find a general purpose end-to-end deep neural network to perform modeling of nonlinear audio effects.  ...  We show the network modeling various nonlinearities and we discuss the generalization capabilities among different instruments.  ...  Accordingly, the back-end consists of an unpooling layer, a deep neural network with smooth adaptive activation functions (DNN-SAAF) and a single CNN layer.  ... 
arXiv:1810.06603v2 fatcat:vamhnq7lsvgslmrse33tmnsuoy

Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters [article]

Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
2021 arXiv   pre-print
Second, we addressed the need of neural sequence to sequence modeling approach for the task of TTS based on recurrent networks.  ...  Bidirectional long short-term memory (LSTM) and gated recurrent unit (GRU) are studied and applied to model continuous parameters for more natural-sounding like a human.  ...  The Titan X GPU used was donated by NVIDIA Corporation.  ... 
arXiv:2106.10481v1 fatcat:h3xrtn67vbf67mmvlpgt5ejwyy

RNN-based speech synthesis using a continuous sinusoidal model [article]

Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh
2019 arXiv   pre-print
In this paper, we address the use of sequence-to-sequence modeling with recurrent neural networks (RNNs).  ...  Recently in statistical parametric speech synthesis, we proposed a continuous sinusoidal model (CSM) using continuous F0 (contF0) in combination with Maximum Voiced Frequency (MVF), which was successfully  ...  Neural Network Setting A hyperbolic tangent activation function was applied.  ... 
arXiv:1904.06075v1 fatcat:j4mexgz4gvg4ffgeyvauooygp4

Neural Processing of Auditory Signals and Modular Neural Control for Sound Tropism of Walking Machines

Poramate Manoonpong, Frank Pasemann, Joern Fischer, Hubert Roth
2005 International Journal of Advanced Robotic Systems  
The neural preprocessing network is acting as a low-pass filter and it is followed by a network which discerns between signals coming from the left or the right.  ...  The parameters of these networks are optimized by an evolutionary algorithm.  ...  Acknowledgement The Integrated Structure Evolution Environment (ISEE) software platform for the evolution of recurrent neural networks was provided by Keyan Zahedi, Martin Hülse, and Steffen Wischmann  ... 
doi:10.5772/5786 fatcat:exfrrwllrbewfhtvh3dvzwlsg4

WaveNet: A Generative Model for Raw Audio [article]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
2016 arXiv   pre-print
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.  ...  A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity.  ...  Then a set of generative models, such as hidden Markov models (HMMs) (Yoshimura, 2002) , feed-forward neural networks (Zen et al., 2013) , and recurrent neural networks (Tuerk & Robinson, 1993; Karaali  ... 
arXiv:1609.03499v2 fatcat:x2j3gbxuczaaldjl2r2p54qx2m

The artificial intelligence renaissance: deep learning and the road to human-Level machine intelligence

Kar-Han Tan, Boon Pang Lim
2018 APSIPA Transactions on Signal and Information Processing  
A number of problems that were considered too challenging just a few years ago can now be solved convincingly by deep neural networks.  ...  network architectures.  ...  ACKNOWLEDGEMENTS The first author would like to thank Irwin Sobel for pointers on the pioneering work at MIT, and Xiaonan Zhou for her work on many of the deep neural network results shown.  ... 
doi:10.1017/atsip.2018.6 fatcat:6iftrepekjdmjffcb5ouz42jke

Neural Network Based Hausa Language Speech Recognition

Matthew K, Ibikunle A., Gregory Onwodi
2012 International Journal of Advanced Research in Artificial Intelligence (IJARAI)  
A pattern recognition neural network was used for developing the system.  ...  Hausa language is an important indigenous lingua franca in west and central Africa, spoken as a first or second language by about fifty million people.  ...  Based on architecture (connection patterns), artificial neural networks can be grouped into two groups; feed-forward networks and recurrent (feedback) networks.  ... 
doi:10.14569/ijarai.2012.010207 fatcat:wibwbpzzbzgrjdaikrxjgwytme

Lombard Speech Synthesis Using Transfer Learning in a Tacotron Text-to-Speech System

Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
2019 Interspeech 2019  
dictionary, by using letter-to-sound (LTS) rules or by using recent neural network based G2P mappings (Juzová et al. 2019 ).  ...  applying an activation function.  ... 
doi:10.21437/interspeech.2019-1333 dblp:conf/interspeech/BollepalliJA19 fatcat:5uz43svog5erzev5nzakdnc4qe
« Previous Showing results 1 — 15 out of 3,394 results