Filters








600 Hits in 7.3 sec

Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks [article]

Chitralekha Gupta, Purnima Kamath, Lonce Wyse
<span title="2021-03-12">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude  ...  Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.  ...  This study contributes to the development of general and efficient representations for training GANs for complex audio texture synthesis.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.07390v1">arXiv:2103.07390v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/sqp2p2gsazgfnjrmnxt5etccdi">fatcat:sqp2p2gsazgfnjrmnxt5etccdi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210324161656/https://arxiv.org/pdf/2103.07390v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9f/a2/9fa23788b3cde2ef4c891e7f0924bbe52bcfe10a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.07390v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks

Chitralekha Gupta, Purnima Kamath, Lonce Wyse
<span title="2021-06-29">2021</span> <i title="Zenodo"> Zenodo </i> &nbsp;
Generative Adversarial Networks (GANs) currently achieve the state-of-the-art sound synthesis quality for pitched musical instruments using a 2-channel spectrogram representation consisting of log magnitude  ...  Furthermore, the sound quality for pitched sounds is comparable to using the IFSpectrogram, even while using a simpler representation with half the memory requirements.  ...  This study contributes to the development of general and efficient representations for training GANs for complex audio texture synthesis.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.5040541">doi:10.5281/zenodo.5040541</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3bnvjhjp2vhydg7dxsayhejohi">fatcat:3bnvjhjp2vhydg7dxsayhejohi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210713205901/https://zenodo.org/record/5040542/files/SMC_2021_paper_23.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/e4/a9/e4a9e88fa7f0b3362354ef9dc2e2691fd238f1f2.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.5040541"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> zenodo.org </button> </a>

Identity-Preserving Realistic Talking Face Generation [article]

Sanjana Sinha, Sandika Biswas, Brojeshwar Bhowmick
<span title="2020-05-25">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We first generate person-independent facial landmarks from audio using DeepSpeech features for invariance to different voices, accents, etc.  ...  Finally, we use LSGAN to generate the facial texture from person-specific facial landmarks, using an attention mechanism that helps to preserve identity-related texture.  ...  Further, this intermediate representation is used to generate the facial texture with motion defined at the landmark stage. • Our proposed audio-to-landmark generator network uses DeepSpeech features to  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.12318v1">arXiv:2005.12318v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/a6oumzdyzvep5ik33cmwm3yj5u">fatcat:a6oumzdyzvep5ik33cmwm3yj5u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200529010449/https://arxiv.org/pdf/2005.12318v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.12318v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Speech2Video: Cross-Modal Distillation for Speech to Video Generation [article]

Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao
<span title="2021-07-10">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The extracted features are then integrated by a generative adversarial network into talking face video clips.  ...  The challenge mainly lies in disentangling the distinct visual attributes from audio signals.  ...  Generative Adversarial Networks (GANs) supplement the generator with a competing discriminator, to train the generator models unsupervisedly.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.04806v1">arXiv:2107.04806v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/lbx74ctptvdtdjiwyrvbq4cnhu">fatcat:lbx74ctptvdtdjiwyrvbq4cnhu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210721092225/https://arxiv.org/pdf/2107.04806v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/73/ed/73ede03fe57e40380a0b0814adb135fd235622b3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2107.04806v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Bandwidth Extension on Raw Audio via Generative Adversarial Networks [article]

Sung Kim, Visvesh Sathe
<span title="2019-03-21">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Neural network-based methods have recently demonstrated state-of-the-art results on image synthesis and super-resolution tasks, in particular by using variants of generative adversarial networks (GANs)  ...  In this work we explore a GAN-based method for audio processing, and develop a convolutional neural network architecture to perform audio super-resolution.  ...  Audio modeling with neural networks Learning-based approaches for audio have also been explored in the largely in the context of representation learning, generative modeling, and text-tospeech (TTS) systems  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1903.09027v1">arXiv:1903.09027v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/sbpiyc5kjjc3zj54l2funla6pu">fatcat:sbpiyc5kjjc3zj54l2funla6pu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200910112819/https://arxiv.org/pdf/1903.09027v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/96/1e/961e9a7a22bb6bf6fd2f5d181b541bb2d519dbfd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1903.09027v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Generative Adversarial Networks in Human Emotion Synthesis:A Review [article]

Noushin Hajarolasvadi, Miguel Arjona Ramírez, Hasan Demirel
<span title="2020-11-07">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Synthesizing realistic data samples is of great value for both academic and industrial communities.  ...  Deep generative models have become an emerging topic in various research areas like computer vision and signal processing.  ...  Acknowledgements We would like to thank Eastern Mediterranean University for supporting this research work through the BAP-C project under the grant number BAP-C-02-18-0001.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.15075v2">arXiv:2010.15075v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/4ifqzcdkevbm3jrxhxy4rpdnba">fatcat:4ifqzcdkevbm3jrxhxy4rpdnba</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201113185411/https://arxiv.org/pdf/2010.15075v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1d/1a/1d1adb15e55ca5241d6c9418775ae8488f06535e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.15075v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors [article]

Ruobing Zheng, Zhou Zhu, Bo Song, Changjiang Ji
<span title="2021-05-05">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Lip sync has emerged as a promising technique for generating mouth movements from audio signals. However, synthesizing a high-resolution and photorealistic virtual news anchor is still challenging.  ...  A pair of Temporal Convolutional Networks are used to learn the cross-modal sequential mapping from audio signals to mouth movements, followed by a neural rendering network that translates the synthetic  ...  Conclusion This paper describes a novel lip-sync approach for synthesizing high-resolution and photoreal talking head from speech signals.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.08700v2">arXiv:2002.08700v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cyyrena2ujgmriokwa3wludjqq">fatcat:cyyrena2ujgmriokwa3wludjqq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200321104513/https://arxiv.org/pdf/2002.08700v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.08700v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

On Using Backpropagation for Speech Texture Generation and Voice Conversion [article]

Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio
<span title="2018-03-08">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion  ...  Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples.  ...  [32] used an untrained single-layer network to synthesize simple audio textures such as keyboard and machine gun, and attempted audio style transfer between different musical pieces.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1712.08363v2">arXiv:1712.08363v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mm4djvywpvevfkrysrwmkbar24">fatcat:mm4djvywpvevfkrysrwmkbar24</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200930200312/https://arxiv.org/pdf/1712.08363v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a3/fa/a3fa84b87bc7b119284bad95eed0bc5aaabc199a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1712.08363v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Generative Adversarial Networks in Human Emotion Synthesis: A Review

Noushin Hajarolasvadi, Miguel Arjona Ramirez, Wesley Beccaro, Hasan Demirel
<span title="">2020</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/q7qi7j4ckfac7ehf3mjbso4hne" style="color: black;">IEEE Access</a> </i> &nbsp;
As conclusions, we indicate common problems that can be explored from the Generative Adversarial Networks (GAN) topologies and applications in emotion synthesis.  ...  These models allow synthesizing realistic data samples that are of great value for both academic and industrial communities.  ...  ACKNOWLEDGEMENTS We would like to thank Eastern Mediterranean University for supporting this research work through the BAP-C project under the grant number BAP-C-02-18-0001.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2020.3042328">doi:10.1109/access.2020.3042328</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/v4w44sw6kjgcldsqnonozr3rja">fatcat:v4w44sw6kjgcldsqnonozr3rja</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201206025919/https://ieeexplore.ieee.org/ielx7/6287639/6514899/09279199.pdf?tp=&amp;arnumber=9279199&amp;isnumber=6514899&amp;ref=" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/02/17/0217e51d669007e6ef159b21ce196b2323804c3c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/access.2020.3042328"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> ieee.com </button> </a>

High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram [article]

Leyuan Sheng, Dong-Yan Huang, Evgeniy N. Pavlovskiy
<span title="2019-12-03">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
From the resulting super-resolution spectrogram networks, we can generate enhanced spectrograms to produce high quality synthesized speech.  ...  However, the generated spectrograms are over-smooth, that could not produce high quality synthesized speech.  ...  Conditional GANs As described in [19] , GANs are generative adversarial networks, which consist of two adversarial models: the generator and the discriminator (G and D).  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.01167v1">arXiv:1912.01167v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bjcl5zcuofapxkt5f25h6gsr3q">fatcat:bjcl5zcuofapxkt5f25h6gsr3q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200930083546/https://arxiv.org/pdf/1912.01167v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/16/7e/167e71bfc261533bcc0bdc07dfbd9258e14776dd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1912.01167v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [article]

Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, Juyong Zhang
<span title="2021-08-19">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper, we address this problem with the aid of neural scene representation networks.  ...  to the audio signal is synthesized using volume rendering.  ...  [24] utilize a generative adversarial network to synthesize photo-realistic skin texture that can handle skin deformations conditioned on renderings. Kim et al.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.11078v3">arXiv:2103.11078v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jcyv42gednfszjqxymkqojdaxe">fatcat:jcyv42gednfszjqxymkqojdaxe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210825024452/https://arxiv.org/pdf/2103.11078v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a8/1e/a81ef3138365f762334736769174738413cccdd9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2103.11078v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Multi Modal Adaptive Normalization for Audio to Video Generation [article]

Neeraj Kumar, Srishti Goel, Ankur Narang, Brejesh Lall
<span title="2020-12-14">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Synthesizing highly expressive facial videos from the audio input and static image is still a challenging task for generative adversarial networks.  ...  In this paper, we propose a multi-modal adaptive normalization(MAN) based architecture to synthesize a talking person video of arbitrary length using as input: an audio signal and a single image of a person  ...  [9] uses an audio transformation network (ATnet) for audio to landmark generation and a visual generation network for facial generation.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.07304v1">arXiv:2012.07304v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/amgweallhzcabgvajvrlilizvq">fatcat:amgweallhzcabgvajvrlilizvq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201218034043/https://arxiv.org/pdf/2012.07304v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/dc/9e/dc9ec9cde734084888a6e529488aed42abbd2c8e.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.07304v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Learning Disentangled Representations for Timber and Pitch in Music Audio [article]

Yun-Ning Hung, Yi-An Chen, Yi-Hsuan Yang
<span title="2018-11-08">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Drawing upon state-of-the-art techniques in representation learning, we propose in this paper two deep convolutional neural network models for learning disentangled representation of musical timbre and  ...  Both models use encoders/decoders and adversarial training to learn music representations, but the second model additionally uses skip connections to deal with the pitch information.  ...  As secondary contributions, for music editing and generation purposes, we additionally train another encoder/decoder sub-network to convert the pianorolls into audio signals.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.03271v1">arXiv:1811.03271v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/qpemri5xrbgfrfqv6fqoiymuca">fatcat:qpemri5xrbgfrfqv6fqoiymuca</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191018060228/https://arxiv.org/pdf/1811.03271v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3d/1d/3d1d363d2c57383c9dbb0b0e1a33e3477facc60f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.03271v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Synthesizing Diverse, High-Quality Audio Textures [article]

Joseph Antognini, Matt Hoffman, Ron J. Weiss
<span title="2018-06-20">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional.  ...  Finally we describe the implications of these results for the problem of audio style transfer.  ...  ACKNOWLEDGMENTS The authors are grateful to Josh McDermott for providing the synthesized textures using the technique of McDermott and Simoncelli [2] for comparison with the technique in this paper.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1806.08002v1">arXiv:1806.08002v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/edcfwpuhyvbwhn27rdgqt5qbsa">fatcat:edcfwpuhyvbwhn27rdgqt5qbsa</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191015050224/https://arxiv.org/pdf/1806.08002v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ea/63/ea635e2ecc38a766a6b4468f8e86971c141e8595.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1806.08002v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Direct Speech-to-image Translation [article]

Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao
<span title="2020-04-09">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Subsequently, a stacked generative adversarial network is used to synthesize high-quality images conditioned on the embedding feature.  ...  Experimental results on both synthesized and real data show that our proposed method is effective to translate the raw speech signals into images without the middle text representation.  ...  Generative Adversarial Networks Generative adversarial networks (GANs) have drawn much attention since it was presented by Goodfellow et al.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.03413v2">arXiv:2004.03413v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/e3tvcmlvvjertkrxufrkkibxqi">fatcat:e3tvcmlvvjertkrxufrkkibxqi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200415225913/https://arxiv.org/pdf/2004.03413v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.03413v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 600 results