2,840 Hits in 3.6 sec

Locally Masked Convolution for Autoregressive Models [article]

Ajay Jain and Pieter Abbeel and Deepak Pathak
2020 arXiv   pre-print
State-of-the-art estimators for natural images are autoregressive, decomposing the joint distribution over pixels into a product of conditionals parameterized by a deep neural network, e.g. a convolutional  ...  For tasks such as image completion, these models are unable to use much of the observed context.  ...  Acknowledgements We thank Paras Jain, Nilesh Tripuraneni, Joseph Gonzalez and Jonathan Ho for helpful discussions, and reviewers for helpful suggestions.  ... 
arXiv:2006.12486v3 fatcat:wbz2rnvhtjcepja7vfifcp4xey

The Image Local Autoregressive Transformer [article]

Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu, XiangYang Xue, Yanwei Fu
2021 arXiv   pre-print
Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism.  ...  Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs).  ...  Local Autoregressive (LA) attention mask.  ... 
arXiv:2106.02514v2 fatcat:tajzufodwzeujpytwfq7wropj4

MaCow: Masked Convolutional Generative Flow [article]

Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy
2019 arXiv   pre-print
In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution.  ...  estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models.  ...  We propose to use masked convolutions to restrict the local connectivity in a small "masked" kernel to address these two problems.  ... 
arXiv:1902.04208v5 fatcat:u4djxn3hwjf4ljwd63j7akyrwm

Flow-based Spatio-Temporal Structured Prediction of Dynamics [article]

Mohsen Zand, Ali Etemad, Michael Greenspan
2022 arXiv   pre-print
We specifically propose to use conditional priors to factorize the latent space for the time dependent modeling. We also exploit the use of masked convolutions as autoregressive conditionals in CNFs.  ...  for structured output learning.  ...  We use locally masked convolution (LMConv) [47] to generate masks and use them as kernel weights for convolutions.  ... 
arXiv:2104.04391v2 fatcat:adddsj6dfzbldk2p2zgkzuq6li

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies [article]

Alexander H. Liu, Yu-An Chung, James Glass
2020 arXiv   pre-print
NPC has a conceptually simple objective and can be implemented easily with the introduced Masked Convolution Blocks.  ...  In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech  ...  For the NPC model, we use multi-layer convolution networks, each layer consists of a ConvBlock and Masked ConvBlock as described in Fig. 1 .  ... 
arXiv:2011.00406v1 fatcat:co2in2a76vaqrc6h7rbthktvui

Parallel Neural Local Lossless Compression [article]

Mingtian Zhang and James Townsend and Ning Kang and David Barber
2022 arXiv   pre-print
In this paper, we propose two parallelization schemes for local autoregressive models.  ...  The recently proposed Neural Local Lossless Compression (NeLLoC), which is based on a local autoregressive model, has achieved state-of-the-art (SOTA) out-of-distribution (OOD) generalization performance  ...  Therefore, to shear the model, we only need to shear the first convolution kernel. Figure 5 visualizes the sheared convolutional kernel for two local autoregressive models with h = 1 and h = 2.  ... 
arXiv:2201.05213v3 fatcat:lv5ww4zddrcktjbq4oiw74mvoy

Non-local Attention Optimized Deep Image Compression [article]

Haojie Liu, Tong Chen, Peiyao Guo, Qiu Shen, Xun Cao, Yao Wang, Zhan Ma
2019 arXiv   pre-print
, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance  ...  Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations  ...  [18] have proposed to extract autoregressive information by a 2D 5×5 masked convolution, which is combined with hyperpriors using stacked 1×1 convolution, for probability estimation.  ... 
arXiv:1904.09757v1 fatcat:aaadz5oxzjekzdlfqt5tosdlle

Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling [article]

Tong Chen, Haojie Liu, Zhan Ma, Qiu Shen, Xun Cao, Yao Wang
2019 arXiv   pre-print
importance, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors.  ...  capture both local and global correlations, 2) applies attention mechanism to generate masks that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their  ...  [21] proposed to extract autoregressive information by a 2D 5×5 masked convolution at each feature channel.  ... 
arXiv:1910.06244v1 fatcat:yakth45q7zdpfkduciithn4qby

Autoregressive Unsupervised Image Segmentation [article]

Yassine Ouali, Céline Hudelot, Myriam Tami
2020 arXiv   pre-print
Taking inspiration from autoregressive generative models that predict the current pixel from past pixels in a raster-scan ordering created with masked convolutions, we propose to use different orderings  ...  While masked convolutions are used during training, in inference, no masking is applied and we fall back to the standard convolution where the model has access to the full input.  ...  We would also like to thank Saclay-IA plateform of Université Paris-Saclay and Mésocentre computing center of CentraleSupélec andÉcole Normale Supérieure Paris-Saclay for providing the computational resources  ... 
arXiv:2007.08247v1 fatcat:75fdq5g3nfco3nmf2bvj5ln6qu

Pushing the Limits of Non-Autoregressive Speech Recognition [article]

Edwin G. Ng, Chung-Cheng Chiu, Yu Zhang, William Chan
2021 arXiv   pre-print
We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal.  ...  We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.  ...  Our work leverages the Conformer architecture [3] , which combines multiheaded self-attention [12] with convolutions to model local and global dependencies of the audio sequence in a parameter efficient  ... 
arXiv:2104.03416v3 fatcat:muqaw7ua5bfdncgccbwjfzunda

Split Hierarchical Variational Compression [article]

Tom Ryder, Chen Zhang, Ning Kang, Shifeng Zhang
2022 arXiv   pre-print
Firstly, we propose an efficient autoregressive prior, the autoregressive sub-pixel convolution, that allows a generalisation between per-pixel autoregressions and fully factorised probability models.  ...  Secondly, we define our coding framework, the autoregressive initial bits, that flexibly supports parallel coding and avoids -- for the first time -- many of the practicalities commonly associated with  ...  A.2 Masked 3D Convolutions For large k it becomes impractical to train using twodimensional convolutions.  ... 
arXiv:2204.02071v1 fatcat:bavjymyqnnglro3pushiep6f54

Variational Lossy Autoencoder [article]

Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel
2017 arXiv   pre-print
In addition, by leveraging autoregressive models as both prior distribution p(z) and decoding distribution p(x|z), we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art  ...  For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture.  ...  For the PixelCNN, it has 6 masked convolution layers with 12 3x3 filters organized in ResNet blocks, and it has 4 additional 1x1 convolution ResNet block between every other masked convolution layer to  ... 
arXiv:1611.02731v2 fatcat:c7qhabenejhw3ej3aydtg6fwda

Natural Image Manipulation for Autoregressive Models Using Fisher Scores [article]

Wilson Yan, Jonathan Ho, Pieter Abbeel
2020 arXiv   pre-print
In this paper, we propose using Fisher scores as a method to extract embeddings from an autoregressive model to use for interpolation and show that our method provides more meaningful sample manipulation  ...  Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.  ...  PixelCNNs use a series of masked convolutions to define an autoregressive model over image data.  ... 
arXiv:1912.05015v2 fatcat:h5odp5iyp5fbhe3y22fjif4xyy

MintNet: Building Invertible Neural Networks with Masked Convolutions [article]

Yang Song and Chenlin Meng and Stefano Ermon
2019 arXiv   pre-print
Inversion is achieved with a locally convergent iterative procedure that is parallelizable and very fast in practice.  ...  Additionally, the determinant of the Jacobian can be computed analytically and efficiently, enabling their generative use as flow models.  ...  for MintNet, i-ResNet and autoregressive method on the same model architectures.  ... 
arXiv:1907.07945v2 fatcat:c36v2br2mvcahn3muzropbrd6e

An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition [article]

Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao, Abeer Alwan
2021 arXiv   pre-print
Second, we propose to expand the trigger mask (acoustic boundary) for each token to increase the robustness of CTC alignments.  ...  Non-autoregressive mechanisms can significantly decrease inference time for speech transformers, especially when the single step variant is applied.  ...  To alleviate this problem, convolution augmented self-attention blocks are proposed to emphasise the modelling of local dependencies of the input sequence in the encoder [22, 29] .  ... 
arXiv:2106.09885v2 fatcat:bxgg62j5qfh7vnwwofkvorrzs4
« Previous Showing results 1 — 15 out of 2,840 results