A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Locally Masked Convolution for Autoregressive Models
[article]
2020
arXiv
pre-print
State-of-the-art estimators for natural images are autoregressive, decomposing the joint distribution over pixels into a product of conditionals parameterized by a deep neural network, e.g. a convolutional ...
For tasks such as image completion, these models are unable to use much of the observed context. ...
Acknowledgements We thank Paras Jain, Nilesh Tripuraneni, Joseph Gonzalez and Jonathan Ho for helpful discussions, and reviewers for helpful suggestions. ...
arXiv:2006.12486v3
fatcat:wbz2rnvhtjcepja7vfifcp4xey
The Image Local Autoregressive Transformer
[article]
2021
arXiv
pre-print
Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. ...
Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). ...
Local Autoregressive (LA) attention mask. ...
arXiv:2106.02514v2
fatcat:tajzufodwzeujpytwfq7wropj4
MaCow: Masked Convolutional Generative Flow
[article]
2019
arXiv
pre-print
In this work, we introduce masked convolutional generative flow (MaCow), a simple yet effective architecture of generative flow using masked convolution. ...
estimation on standard image benchmarks, considerably narrowing the gap to autoregressive models. ...
We propose to use masked convolutions to restrict the local connectivity in a small "masked" kernel to address these two problems. ...
arXiv:1902.04208v5
fatcat:u4djxn3hwjf4ljwd63j7akyrwm
Flow-based Spatio-Temporal Structured Prediction of Dynamics
[article]
2022
arXiv
pre-print
We specifically propose to use conditional priors to factorize the latent space for the time dependent modeling. We also exploit the use of masked convolutions as autoregressive conditionals in CNFs. ...
for structured output learning. ...
We use locally masked convolution (LMConv) [47] to generate masks and use them as kernel weights for convolutions. ...
arXiv:2104.04391v2
fatcat:adddsj6dfzbldk2p2zgkzuq6li
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
[article]
2020
arXiv
pre-print
NPC has a conceptually simple objective and can be implemented easily with the introduced Masked Convolution Blocks. ...
In this work, we propose Non-Autoregressive Predictive Coding (NPC), a self-supervised method, to learn a speech representation in a non-autoregressive manner by relying only on local dependencies of speech ...
For the NPC model, we use multi-layer convolution networks, each layer consists of a ConvBlock and Masked ConvBlock as described in Fig. 1 . ...
arXiv:2011.00406v1
fatcat:co2in2a76vaqrc6h7rbthktvui
Parallel Neural Local Lossless Compression
[article]
2022
arXiv
pre-print
In this paper, we propose two parallelization schemes for local autoregressive models. ...
The recently proposed Neural Local Lossless Compression (NeLLoC), which is based on a local autoregressive model, has achieved state-of-the-art (SOTA) out-of-distribution (OOD) generalization performance ...
Therefore, to shear the model, we only need to shear the first convolution kernel. Figure 5 visualizes the sheared convolutional kernel for two local autoregressive models with h = 1 and h = 2. ...
arXiv:2201.05213v3
fatcat:lv5ww4zddrcktjbq4oiw74mvoy
Non-local Attention Optimized Deep Image Compression
[article]
2019
arXiv
pre-print
, and apply attention mechanism to generate masks that are used to weigh the features for the image and hyperprior, which implicitly adapt bit allocation for different features based on their importance ...
Our NLAIC framework embeds non-local operations in the encoders and decoders for both image and latent feature probability information (known as hyperprior) to capture both local and global correlations ...
[18] have proposed to extract autoregressive information by a 2D 5×5 masked convolution, which is combined with hyperpriors using stacked 1×1 convolution, for probability estimation. ...
arXiv:1904.09757v1
fatcat:aaadz5oxzjekzdlfqt5tosdlle
Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling
[article]
2019
arXiv
pre-print
importance, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors. ...
capture both local and global correlations, 2) applies attention mechanism to generate masks that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their ...
[21] proposed to extract autoregressive information by a 2D 5×5 masked convolution at each feature channel. ...
arXiv:1910.06244v1
fatcat:yakth45q7zdpfkduciithn4qby
Autoregressive Unsupervised Image Segmentation
[article]
2020
arXiv
pre-print
Taking inspiration from autoregressive generative models that predict the current pixel from past pixels in a raster-scan ordering created with masked convolutions, we propose to use different orderings ...
While masked convolutions are used during training, in inference, no masking is applied and we fall back to the standard convolution where the model has access to the full input. ...
We would also like to thank Saclay-IA plateform of Université Paris-Saclay and Mésocentre computing center of CentraleSupélec andÉcole Normale Supérieure Paris-Saclay for providing the computational resources ...
arXiv:2007.08247v1
fatcat:75fdq5g3nfco3nmf2bvj5ln6qu
Pushing the Limits of Non-Autoregressive Speech Recognition
[article]
2021
arXiv
pre-print
We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. ...
We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model. ...
Our work leverages the Conformer architecture [3] , which combines multiheaded self-attention [12] with convolutions to model local and global dependencies of the audio sequence in a parameter efficient ...
arXiv:2104.03416v3
fatcat:muqaw7ua5bfdncgccbwjfzunda
Split Hierarchical Variational Compression
[article]
2022
arXiv
pre-print
Firstly, we propose an efficient autoregressive prior, the autoregressive sub-pixel convolution, that allows a generalisation between per-pixel autoregressions and fully factorised probability models. ...
Secondly, we define our coding framework, the autoregressive initial bits, that flexibly supports parallel coding and avoids -- for the first time -- many of the practicalities commonly associated with ...
A.2 Masked 3D Convolutions For large k it becomes impractical to train using twodimensional convolutions. ...
arXiv:2204.02071v1
fatcat:bavjymyqnnglro3pushiep6f54
Variational Lossy Autoencoder
[article]
2017
arXiv
pre-print
In addition, by leveraging autoregressive models as both prior distribution p(z) and decoding distribution p(x|z), we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art ...
For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. ...
For the PixelCNN, it has 6 masked convolution layers with 12 3x3 filters organized in ResNet blocks, and it has 4 additional 1x1 convolution ResNet block between every other masked convolution layer to ...
arXiv:1611.02731v2
fatcat:c7qhabenejhw3ej3aydtg6fwda
Natural Image Manipulation for Autoregressive Models Using Fisher Scores
[article]
2020
arXiv
pre-print
In this paper, we propose using Fisher scores as a method to extract embeddings from an autoregressive model to use for interpolation and show that our method provides more meaningful sample manipulation ...
Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim. ...
PixelCNNs use a series of masked convolutions to define an autoregressive model over image data. ...
arXiv:1912.05015v2
fatcat:h5odp5iyp5fbhe3y22fjif4xyy
MintNet: Building Invertible Neural Networks with Masked Convolutions
[article]
2019
arXiv
pre-print
Inversion is achieved with a locally convergent iterative procedure that is parallelizable and very fast in practice. ...
Additionally, the determinant of the Jacobian can be computed analytically and efficiently, enabling their generative use as flow models. ...
for MintNet, i-ResNet and autoregressive method on the same model architectures. ...
arXiv:1907.07945v2
fatcat:c36v2br2mvcahn3muzropbrd6e
An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition
[article]
2021
arXiv
pre-print
Second, we propose to expand the trigger mask (acoustic boundary) for each token to increase the robustness of CTC alignments. ...
Non-autoregressive mechanisms can significantly decrease inference time for speech transformers, especially when the single step variant is applied. ...
To alleviate this problem, convolution augmented self-attention blocks are proposed to emphasise the modelling of local dependencies of the input sequence in the encoder [22, 29] . ...
arXiv:2106.09885v2
fatcat:bxgg62j5qfh7vnwwofkvorrzs4
« Previous
Showing results 1 — 15 out of 2,840 results