Filters








23 Hits in 2.5 sec

End-to-end Sinkhorn Autoencoder with Noise Generator

Kamil Deja, Jan Dubinski, Piotr Nowak, Sandro Wenzel, Przemyslaw Spurek, Tomasz Trzcinski
2020 IEEE Access  
More precisely, we extend autoencoder architecture by adding a deterministic neural network trained to map noise from a known distribution onto autoencoder latent space representing data distribution.  ...  To address these shortcomings, we introduce a novel method dubbed end-to-end Sinkhorn Autoencoder, that leverages the Sinkhorn algorithm to explicitly align distribution of encoded real data examples and  ...  To stabilise the training of GANs, in [1] authors propose to substitute KL divergence with Wasserstein distance.  ... 
doi:10.1109/access.2020.3048622 fatcat:pm6gjwnt4jdlrnm47as5j2gwcy

Federated Distillation of Natural Language Understanding with Confident Sinkhorns [article]

Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria
2021 arXiv   pre-print
We make our codes public at https://github.com/declare-lab/sinkhorn-loss.  ...  To learn the global model, the objective is to minimize the optimal transport cost of the global model's predictions from the confident sum of soft-targets assigned by local models.  ...  Interpolating between optimal transport and mmd using sinkhorn divergences.  ... 
arXiv:2110.02432v1 fatcat:wxcyec3z6rdujjhvxg2r46aplm

Sinkhorn AutoEncoders [article]

Giorgio Patrini, Rianne van den Berg, Patrick Forré, Marcello Carioni, Samarth Bhargav, Max Welling, Tim Genewein, Frank Nielsen
2019 arXiv   pre-print
We then introduce the Sinkhorn auto-encoder (SAE), which approximates and minimizes the p-Wasserstein distance in latent space via backprogation through the Sinkhorn algorithm.  ...  In Wasserstein AutoEncoders (WAE) , this is enforced by the heuristic choice of either the Maximum Mean Discrepancy (MMD), or by adversarial training on the latent space.  ...  We start our experiments by studying unsupervised representation learning by training an encoder in isolation.  ... 
arXiv:1810.01118v3 fatcat:ne5cibw7gnai3nbbho35nsp6ou

Learning from Noisy Labels via Discrepant Collaborative Training

Yan Han, Soumava Kumar Roy, Lars Petersson, Mehrtash Harandi
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
Empirical results of our proposed algorithm, Discrepant Collaborative Training (DCT), achieve competitive results against several current state-of-the-art algorithms across MNIST, CIFAR10 and CIFAR100,  ...  Towards this aim, we modify a collaborative training framework to utilize discrepancy constraints between respective feature extractors enabling the learning of distinct, yet discriminative features, pacifying  ...  Collaborative Training.  ... 
doi:10.1109/wacv45572.2020.9093619 dblp:conf/wacv/HanRPH20 fatcat:e5uthuy5xngoncllcdunfnfic4

Computational Optimal Transport [article]

Gabriel Peyré, Marco Cuturi
2020 arXiv   pre-print
Let S(P) ⊂ E be the subset of edges {(i, j ), i ∈ n , j ∈ m such that P ij > Acknowledgements We would like to thank the many colleagues, collaborators and students who have helped us at various stages  ...  Bernton, Mathieu Blondel, Nicolas Courty, Rémi Flamary, Alexandre Gramfort, Young-Heon Kim, Daniel Matthes, Philippe Rigollet, Filippo Santambrogio, Justin Solomon, Jonathan Weed; as well as the feedback by  ...  These bounds consist in evaluating the primal and dual objectives at the solutions provided by the Sinkhorn algorithm. Definition 4.1 (Sinkhorn divergences) .  ... 
arXiv:1803.00567v4 fatcat:zgannw6i6beqde5bx7pj62uyry

Improving Sequence-to-Sequence Learning via Optimal Transport [article]

Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin
2019 arXiv   pre-print
Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE).  ...  However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence.  ...  Reference: atlantis mir part ways after three-day space collaboration by emmanuel UNK Baseline:atlantis separate from mir Ours:atlantis separate from mir space by UNK Source:australia 's news corp announced  ... 
arXiv:1901.06283v1 fatcat:yttdeqp3iffjbnupp5n3esh4xi

Exploiting Variational Domain-Invariant User Embedding for Partially Overlapped Cross Domain Recommendation [article]

Weiming Liu, Xiaolin Zheng, Mengling Hu, Chaochao Chen
2022 arXiv   pre-print
VDEA first adopts variational inference to capture collaborative user preferences, and then utilizes Gromov-Wasserstein distribution co-clustering optimal transport to cluster the users with similar rating  ...  We apply the Sinkhorn algorithm [6, 43] for solving the optimization problem by adopting the Lagrangian multiplier to minimize the objective function ℓ as below: ℓ = ⟨M ⊗ 𝝍, 𝝍⟩ − 𝜖𝐻 (𝝍) − 𝒇 (𝝍1  ...  Discrepancy-based methods, e.g., Maximum Mean Discrepancy (MMD) [3] , Correlation Alignment (CORAL).  ... 
arXiv:2205.06440v1 fatcat:kbxzmkt3pndkrpentjvuk564xi

Compression of Deep Learning Models for Text: A Survey [article]

Manish Gupta, Puneet Agrawal
2021 arXiv   pre-print
RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Encoder Representations from Transformers (BERT) [24], GenerativePre-training  ...  the critical need of building applications with efficient and small models, and the large amount of recently published work inthis area, we believe that this survey organizes the plethora of work done by  ...  Also, for both character-level as well as word-level language modeling, on 1B word benchmark, we observe that Sinkhorn mixture (which is a combination of the Sinkhorn attention by mixing it with the vanilla  ... 
arXiv:2008.05221v4 fatcat:6frf2wzi7zganaqgkuvy4szgmq

Reparameterizing the Birkhoff Polytope for Variational Permutation Inference [article]

Scott W. Linderman, Gonzalo E. Mena, Hal Cooper, Liam Paninski, and John P. Cunningham
2017 arXiv   pre-print
SWL is supported by the Simons Collaboration on the Global Brain (SCGB) Postdoctoral Fellowship (418011). HC is supported by Graphen, Inc.  ...  JPC is supported by the Sloan Foundation, McKnight Foundation, and the SCGB.  ...  To avoid divergence, the matrix W was then re-scaled by 1.1 times its spectral radius.  ... 
arXiv:1710.09508v1 fatcat:7is4omzbrfgo7kb7mucdcefso4

Structure-preserving GANs [article]

Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet, Wei Zhu
2022
We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences  ...  class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence  ...  This work was performed in part using high performance computing equipment obtained under a grant from the Collaborative R&D Fund managed by the Massachusetts Technology Collaborative.  ... 
doi:10.48550/arxiv.2202.01129 fatcat:aesyn3vzgvgfre2ign52i6ervi

Generalized Spectral Clustering via Gromov-Wasserstein Learning [article]

Samir Chowdhury, Tom Needham
2021 arXiv   pre-print
Our experiments on a range of real-world networks achieve comparable results to, and in many cases outperform, the state-of-the-art achieved by GWL.  ...  We show that when comparing against a two-node template graph using the heat kernel at the infinite time limit, the resulting partition agrees with the partition produced by the Fiedler vector.  ...  ] or KL divergence regularizer [XLZD19].  ... 
arXiv:2006.04163v2 fatcat:4nczdfbn7faabjzkaw6o7g3s4u

Geometry Aware Deep Metric Learning [article]

Soumava Kumar Roy, University, The Australian National
2020
We then turn our attention to the general training protocol of Siamese Neural Networks (SiNNs), and address a major yet obvious drawback in its training practice.  ...  SiNNs are characterized by a Positive Semi Definite (PSD) matrix M which is invariant to the action of the orthogonal group O(p); thereby resulting in an equivalence class of solutions for M.  ...  "Discrepant Collaborative Training Exploiting S Divergences." (under review in Pattern Recognition Journal 2020). 5.  ... 
doi:10.25911/khz3-8h81 fatcat:afld5e6kxzejvj2uslt4qimbpa

MENA_columbia_0054D_14791.pdf [article]

2018
In the first part, I present a method that leverages Gaussian quadrature to accelerate inference of neural encoding models from a certain type of observed neural point processes -spike trains -resulting  ...  one is faced with the need to match neural recordings to canonical neural identities, in practice resolved by tedious human labor.  ...  . . . . . . . . 93 5.2 (a) Sinkhorn networks can be trained to solve Jigsaw Puzzles.  ... 
doi:10.7916/d86x0tj9 fatcat:tvgrvl3znjd6jgilftm6l5chly

Geometry and Learning from Data in 3D and Beyond

Pradeep Kr. Banerjee, Sumukh Bansal, Ilke Demir, Minh Ha Quang, Lin Huang, Ruben Hühnerbein, Scott C James, Oleg Kachan, Louis Ly, Marius Lysaker, Samee Maharjan, Anton Mallasto (+9 others)
2019
For example, through entropic relaxation of OT, Sinkhorn divergences can be defined.  ...  Indeed, it allows us to define the so-called Sinkhorn divergence, which metrizes the weak-* topology.  ... 
doi:10.13140/rg.2.2.21594.08647 fatcat:bpeya3rkfnenlff3bitchllmcq

Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling [article]

Valentin De Bortoli, James Thornton, Jeremy Heng, Arnaud Doucet
2021 arXiv   pre-print
The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time  ...  Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).  ...  This is part of the collaboration between US DOD, UK MOD and UK EPSRC under the Multidisciplinary University Research Initiative.  ... 
arXiv:2106.01357v4 fatcat:vmvu7hicpbd7hjxqdgvr7bhqji
« Previous Showing results 1 — 15 out of 23 results