18 Hits in 5.9 sec

Robust Semantic Communications with Masked VQ-VAE Enabled Codebook [article]

Qiyu Hu, Guangyi Zhang, Zhijin Qin, Yunlong Cai, Guanding Yu, Geoffrey Ye Li
2022 arXiv   pre-print
Then, we propose to mask a portion of the input, where the semantic noise appears frequently, and design the masked vector quantized-variational autoencoder (VQ-VAE) with the noise-related masking strategy  ...  In this paper, we first propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise.  ...  MASKED VQ-VAE ENABLED DISCRETE CODEBOOK In this section, we design the robust semantic communication systems with masked VQ-VAE.  ... 
arXiv:2206.04011v1 fatcat:mzcxyshv2ve6feunxk27xzjncm

Robust Semantic Communications Against Semantic Noise [article]

Qiyu Hu, Guangyi Zhang, Zhijin Qin, Yunlong Cai, Guanding Yu, Geoffrey Ye Li
2022 arXiv   pre-print
To further improve the robustness of semantic communication systems, we firstly employ the vector quantization-variational autoencoder (VQ-VAE) to design a discrete codebook shared by the transmitter and  ...  Then, the masked autoencoder (MAE) is designed as the architecture of a robust semantic communication system, where a portion of the input is masked.  ...  Simulation results show that our proposed method can significantly improve the robustness of semantic communication systems against the semantic noise with significant reduction on transmission overhead  ... 
arXiv:2202.03338v2 fatcat:pjftkpw7ujbztihlxppiu3toq4

CaCL: Class-aware Codebook Learning for Weakly Supervised Segmentation on Diffuse Image Patterns [article]

Ruining Deng, Quan Liu, Shunxing Bao, Aadarsh Jha, Catie Chang, Bryan A. Millis, Matthew J. Tyska, Yuankai Huo
2022 arXiv   pre-print
; and (3) the proposed algorithm is implemented in a multi-task framework based on Vector Quantised-Variational AutoEncoder (VQ-VAE) via joint image reconstruction, classification, feature embedding, and  ...  The current weakly supervised learning algorithms from the computer vision community are largely designed for focal objects (e.g., dogs and cats).  ...  Class-aware Codebook Based Feature Encoding In this study, we design a class-aware codebook inspired by VQ-VAE2 [11] . With the VQ-VAE framework, three steps are used to process an input image.  ... 
arXiv:2011.00794v2 fatcat:syk44wpa6jcuzen6eh4hbzuriu

Towards Designing and Exploiting Generative Networks for Neutrino Physics Experiments using Liquid Argon Time Projection Chambers [article]

Paul Lutkus, Taritree Wongjirad, Shuchin Aeron
2022 arXiv   pre-print
We implement a Vector-Quantized Variational Autoencoder (VQ-VAE) and PixelCNN which produces images with LArTPC-like features and introduce a method to evaluate the quality of the images using a semantic  ...  In this paper, we show that a hybrid approach to generative modeling via combining the decoder from an autoencoder together with an explicit generative model for the latent space is a promising method  ...  Model weights for the VQ-VAE, Pixel-CNN, and track/shower semantic segmentation network will be uploaded to Zenodo. A sample of generated images are also provided on Zenodo.  ... 
arXiv:2204.02496v1 fatcat:mpxzfmsjmbdhfhadjs3livlh5u

Unsupervised Source Separation By Steering Pretrained Music Models [article]

Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo
2021 arXiv   pre-print
We use OpenAI's Jukebox as the pretrained generative model, and we couple it with four kinds of pretrained music taggers (two architectures and two tagging datasets).  ...  Additionally, we would like to thank the creators of JUKE-BOX for help with their codebase: Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever.  ...  We combine the VQ-VAE from OpenAI's JUKEBOX, a generative model of musical audio, with a music tagger.  ... 
arXiv:2110.13071v1 fatcat:fk7sgzge6vgwxpkllkhzhtlnkm

Self-Supervised Speech Representation Learning: A Review [article]

Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
2022 arXiv   pre-print
Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech.  ...  Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active  ...  A similar insight was obtained in [258] , which compared vq-vae and vq-wav2vec with respect to their ability to discover phonetic units.  ... 
arXiv:2205.10643v2 fatcat:6pveqmlbh5ebrhv2wuvb5hcp7q

Self-supervised Learning: Generative or Contrastive [article]

Xiao Liu, Fanjin Zhang, Zhenyu Hou, Zhaoyu Wang, Li Mian, Jing Zhang, Jie Tang
2021 arXiv   pre-print
Additionally, they adopt a multi-scale hierarchical organization of VQ-VAE, which enables learning local information and global information of images separately.  ...  VQ-VAE relies on vector quantization (VQ) to learn the posterior distribution of discrete latent variables.  ... 
arXiv:2006.08218v5 fatcat:t324amt3lzaehfa262xbn5hkqe

Audio Self-supervised Learning: A Survey [article]

Shuo Liu, Adria Mallol-Ragolta, Emilia Parada-Cabeleiro, Kun Qian, Xin Jing, Alexander Kathan, Bin Hu, Bjoern W. Schuller
2022 arXiv   pre-print
This is similar to a vector-quantised variantional auto-encoder (VQ-VAE) [102] and to vector-quantised autoregressive predictive codebooks are used as in product quantisation [108] .  ...  The optimisation target combines the reconstruction error from VQ-VAE and the loss function of Wav2vec 2.0.  ... 
arXiv:2203.01205v1 fatcat:yllpptzrzrbzthhr2vk63ytlti

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes [article]

Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby
2022 arXiv   pre-print
These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs.  ...  To address this, we employ the technique introduced by the seminal VQ-VAE paper [51] .  ...  We observed that during Stage I training the usage of VQ-VAE dictionary may be highly unbalanced and certain entries going unused.  ... 
arXiv:2205.10337v2 fatcat:poajo3lutzabxbpxz235nzoi4u

A Unifying Review of Deep and Shallow Anomaly Detection

Lukas Ruff, Jacob R. Kauffmann, Robert A. Vandermeulen, Gregoire Montavon, Wojciech Samek, Marius Kloft, Thomas G. Dietterich, Klaus-Robert Muller
2021 Proceedings of the IEEE  
This article deals with application of deep learning techniques to anomaly detection.  ...  The VQ-VAE model, which introduces a discrete codebook between the neural encoder and decoder, presents a way to incorporate this concept that has shown to result in reconstructions with improved quality  ...  If such steps are neglected, the features with wide value ranges, noise, or irrelevant features can dominate distance computations and "mask" anomalies [165] (see VIII-A).  ... 
doi:10.1109/jproc.2021.3052449 fatcat:i65pl2azw5dv7mtq7w7q3ylxgq

A Survey on Vision Transformer [article]

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang (+1 others)
2021 arXiv   pre-print
Given its high performance and less need for vision-specific inductive bias, transformer is receiving more and more attention from the computer vision community.  ...  In the first stage, a discrete VAE is utilized to learn the visual codebook.  ...  Subsequently, Ding et al. proposes CogView [51] , which is a transformer with VQ-VAE tokenizer similar to DALL-E, but supports Chinese text input.  ... 
arXiv:2012.12556v4 fatcat:ldtbdgy6tbdttfqzhzml7n577m

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies  ...  Meanwhile, in order to ensure the diversity and reality of the generated images, some studies try to adopt a limited-dimensional query vocabulary called codebook based on VAE.  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic [article]

Sadegh Aliakbarian
2020 arXiv   pre-print
robustness to the precise definition of the weights.  ...  In this context, VQ-VAE [van den Oord et al., 2017] introduces a discrete latent variable obtained by vector quantization of the latent one that, given a uniform prior over the outcome, yields a fixed  ...  To this end, as discussed in the main paper, we rely on Variational Inference, which approximates the true posterior p θ (z|x) with another distribution q φ (z|x).  ... 
arXiv:2010.04368v1 fatcat:vip2fzkp3becxginqlioskbrfy

Table of contents

2021 ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
SPECTRUM MODELING Qing He, Zhiping Xiu, Thilo Koehler, Jilong Wu, Facebook Inc, United States SPE-3.4: END-TO-END TEXT-TO-SPEECH USING LATENT DURATION BASED ON ...................................... 5694 VQ-VAE  ...  , Qinghua Chi, Shanghai Huawei Technologies Co., Ltd., China IVMSP-29.2: LTAF-NET: LEARNING TASK-AWARE ADAPTIVE FEATURES AND ........................................ 1640 REFINING MASK FOR FEW-SHOT SEMANTIC  ... 
doi:10.1109/icassp39728.2021.9414617 fatcat:m5ugnnuk7nacbd6jr6gv2lsfby

Generative Spoken Language Modeling from Raw Audio [article]

Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux
2021 arXiv   pre-print
pseudo-text units), a generative language model (trained on pseudo-text), and a speech decoder (generating a waveform from pseudo-text) all trained without supervision and validate the proposed metrics with  ...  Tjandra et al. (2020) suggested to use transformer (Vaswani et al., 2017) together with a VQ-VAE model for unsupervised unit discovery, and van Niekerk et al. (2020) combines vector quantization together  ...  Unlike CPC and wav2vec 2.0 that use a contrastive loss, HuBERT is trained with a masked prediction task similar to BERT (Devlin et al., 2019) but with masked continuous audio signals as inputs.  ... 
arXiv:2102.01192v2 fatcat:vuucz32wxjcqrc42s3wo7d5tk4
« Previous Showing results 1 — 15 out of 18 results