Filters








79 Hits in 5.3 sec

The Cramer Distance as a Solution to Biased Wasserstein Gradients [article]

Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos
2017 arXiv   pre-print
Leveraging insights from probabilistic forecasting we propose an alternative to the Wasserstein metric, the Cram\'er distance.  ...  To illustrate the relevance of the Cram\'er distance in practice we design a new algorithm, the Cram\'er Generative Adversarial Network (GAN), and show that it performs significantly better than the related  ...  We first note that, compared to the KL solution, the Cramér solution has significantly smaller Wasserstein distance to the target distribution.  ... 
arXiv:1705.10743v1 fatcat:rfgqmom5qfeq7mnea7fnxtmlbu

Fine Tuning of Generative Models for the Fast Simulation

Fedor Ratnikov, Kovalev Dmitry
2019 Zenodo  
The last task brings to a critical problem: generating the significantly higher amount of Monte Carlo (MC) data, required for analysis of the data collected at higher collider luminosity, without a drastic  ...  increase in computing resources requires a significant speedup of the simulation algorithms.  ...  E 0 ≈ 16.9 GeV (a) E 0 ≈ 29.1 GeV (b) E 0 ≈ 2.455 GeV (c) Solution GAN Architecture Cramer Distance as a Solution to Biased Gradients min G E x,x′ ∼P d , z,z′ ∼ [d(x, G(z)) + d(x′ , G(z′ )) − d(x, x  ... 
doi:10.5281/zenodo.3599671 fatcat:yvdsrjvu2nalngqckziehlli4i

Demystifying MMD GANs [article]

Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton
2021 arXiv   pre-print
and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters.  ...  We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic.  ...  The Cramer distance as a solution to biased Wasserstein gradients, 2017. arXiv:1705.10743. Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai. Better mixing via deep representations.  ... 
arXiv:1801.01401v5 fatcat:uftxgahdfjfdnbpvmsdtsktjcm

On Wasserstein Reinforcement Learning and the Fokker-Planck equation [article]

Pierre H. Richemond, Brendan Maginnis
2017 arXiv   pre-print
We derive policy gradients where the change in policy is limited to a small Wasserstein distance (or trust region).  ...  Policy gradients methods often achieve better performance when the change in policy is limited to a small Kullback-Leibler divergence.  ...  ACKNOWLEDGEMENTS The authors want to thank Gary Pisano of Harvard Business School, as well as Bilal Piot of Google DeepMind, for interesting discussions on the subject.  ... 
arXiv:1712.07185v1 fatcat:pfq7g32ffngdfahp376ytrqfk4

Fast Data-Driven Simulation of Cherenkov Detectors Using Generative Adversarial Networks [article]

Artem Maevskiy, Denis Derkach, Nikita Kazeev, Andrey Ustyuzhanin, Maksim Artemev, Lucio Anderlini
2019 arXiv   pre-print
Thus new approaches to event generation and simulation of detector responses are needed. In LHCb, the accurate simulation of Cherenkov detectors takes a sizeable fraction of CPU time.  ...  This network is trained to reproduce the particle species likelihood function values based on the track kinematic parameters and detector occupancy.  ...  Acknowledgments The research leading to these results has received funding from Russian Science Foundation under grant agreement n 17-72-20127.  ... 
arXiv:1905.11825v2 fatcat:vot4afca5vawzm4534qsucyhie

Generalized Sliced Distances for Probability Distributions [article]

Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Shahin Shahrampour
2020 arXiv   pre-print
However, in a practical setting, the convergence behavior of the algorithms built upon these distances have not been well established, except for a few specific cases.  ...  Finally, by exploiting this connection, we consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum  ...  The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.  ... 
arXiv:2002.12537v1 fatcat:gzais3xkafbc3jurduvtmcploy

Airline Passenger Name Record Generation using Generative Adversarial Networks [article]

Alejandro Mottini, Alix Lheritier, Rodrigo Acuna-Agost
2018 arXiv   pre-print
We propose a solution based on Cramér GANs, categorical feature embedding and a Cross-Net architecture.  ...  To address this difficulty, we propose a method to generate realistic synthetic PNRs using Generative Adversarial Networks (GANs).  ...  The problem arises when estimating the Wasserstein metric from samples, which might yield biased gradients and converge to a wrong minimum. This problem is referred to as biased gradients.  ... 
arXiv:1807.06657v1 fatcat:ajnyvrjdijbo7djg5yrj4oeqpq

Learning with minibatch Wasserstein : asymptotic and gradient properties [article]

Kilian Fatras, Younes Zine, Rémi Flamary, Rémi Gribonval, Nicolas Courty
2021 arXiv   pre-print
We notably argue that it is equivalent to an implicit regularization of the original problem, with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation  ...  , but also with defects such as loss of distance property.  ...  Acknowledgements Authors would like to thank Thibault Séjourné and Jean Feydy for fruitful discussions.  ... 
arXiv:1910.04091v4 fatcat:wqdqgos4kbh7xa32lf5sefws3u

Correcting nuisance variation using Wasserstein distance

Gil Tabak, Minjie Fan, Samuel Yang, Stephan Hoyer, Geoffrey Davis
2020 PeerJ  
To achieve this, we minimize a loss function based on distances between marginal distributions (such as the Wasserstein distance) of embeddings across domains for each replicated treatment.  ...  The general approach is to find a function mapping the images to an embedding space of manageable dimensionality whose geometry captures relevant features of the input images.  ...  ACKNOWLEDGEMENTS We would like to thank Mike Ando, Marc Coram, Marc Berndl, Subhashini Venugopalan, Arunachalam Narayanaswamy, Yaroslav Ganin, Luke Metz, Eric Christiansen, Philip Nelson, and Patrick Riley  ... 
doi:10.7717/peerj.8594 pmid:32161688 pmcid:PMC7050548 fatcat:4225iniv4bex3ihklab5hu6u7y

Restricting Greed in Training of Generative Adversarial Network [article]

Haoxuan You, Zhicheng Jiao, Haojun Xu, Jie Li, Ying Wang, Xinbo Gao
2018 arXiv   pre-print
Training of GAN can be thought of as a greedy procedure, in which the generative net tries to make the locally optimal choice (minimizing loss function of discriminator) in each iteration.  ...  To alleviate these problems, we propose a novel training strategy to restrict greed in training of GAN.  ...  The Cramér distance combines the advantages of the Wasserstein and KL divergences and achieve a more satisfying performance in their experiments.  ... 
arXiv:1711.10152v2 fatcat:rofs65v4nbhu3cwxhwik2n2bgq

Metrizing Fairness [article]

Yves Rychener, Bahar Taskesen, Daniel Kuhn
2022 arXiv   pre-print
We also prove that the unfairness-regularized prediction loss admits unbiased gradient estimators if unfairness is measured by the squared ℒ^2-distance or by a squared maximum mean discrepancy.  ...  Conceptually, we show that the generator of any IPM can be interpreted as a family of utility functions and that unfairness with respect to this IPM arises if individuals in the two demographic groups  ...  The authors are with the Risk Analytics and Optimization Chair, EPFL Lausanne (yves.rychener,bahar.taskesen, daniel.kuhn@epfl.ch).  ... 
arXiv:2205.15049v2 fatcat:5hx2tyzxtjb7xg5tywp6fwbq7q

Regularized Variational Data Assimilation for Bias Treatment using the Wasserstein Metric [article]

Sagar K. Tamang, Ardeshir Ebtehaj, Dongmian Zou, Gilad Lerman
2020 arXiv   pre-print
This approach relies on the Wasserstein metric stemming from the theory of optimal mass transport to penalize the distance between the probability histograms of the analysis state and an a priori reference  ...  dataset, which is likely to be more uncertain but less biased than both model and observations.  ...  Acknowledgements The first and second authors acknowledge the grant from the National Aeronautics and Space Administration (NASA) Terrestrial Hydrology Program (THP, 80NSSC18K1528) and the New (Early Career  ... 
arXiv:2003.02421v1 fatcat:lrgp5vkjwvbknggtevvkhrrocu

Ensemble Riemannian data assimilation over the Wasserstein space

Sagar K. Tamang, Ardeshir Ebtehaj, Peter J. van Leeuwen, Dongmian Zou, Gilad Lerman
2021 Nonlinear Processes in Geophysics  
In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric.  ...  Unlike the Euclidean distance used in classic data assimilation methodologies, the Wasserstein metric can capture the translation and difference between the shapes of square-integrable probability distributions  ...  More recently, Tamang et al. (2020) introduced a Wasserstein regularization in a variational setting to correct for geophysical biases under chaotic dynamics.  ... 
doi:10.5194/npg-28-295-2021 fatcat:g3h5ogsbq5agrcaeru2g7lhxfe

Improving GANs Using Optimal Transport [article]

Tim Salimans, Han Zhang, Alec Radford, Dimitris Metaxas
2018 arXiv   pre-print
We present Optimal Transport GAN (OT-GAN), a variant of generative adversarial nets minimizing a new metric measuring the distance between the generator distribution and the data distribution.  ...  distance function with unbiased mini-batch gradients.  ...  Here, we choose d to be the entropy-regularized Wasserstein distance, or Sinkhorn distance, as defined for mini-batches in Equation 5.  ... 
arXiv:1803.05573v1 fatcat:ns3q4iiqenevxngbivqif36qla

Nonlinear Distributional Gradient Temporal-Difference Learning [article]

Chao Qu, Shie Mannor, Huan Xu
2019 arXiv   pre-print
We prove the asymptotic almost-sure convergence of distributional GTD2 and TDC to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely  ...  We devise a distributional variant of gradient temporal-difference (TD) learning.  ...  However as pointed by them (see proposition 5 in their paper), in practice, it is hard to estimate the Wasserstein distance using samples and furthermore the gradient estimation w.r.t. the parameter of  ... 
arXiv:1805.07732v3 fatcat:ym2nmd2wxjf33hptlqzcnqxyru
« Previous Showing results 1 — 15 out of 79 results