2,341 Hits in 6.8 sec

Non-parametric Binary regression in metric spaces with KL loss [article]

Ariel Avital, Klim Efremenko, Aryeh Kontorovich, David Toplin, Bo Waggoner
2020 arXiv   pre-print
We propose a non-parametric variant of binary regression, where the hypothesis is regularized to be a Lipschitz function taking a metric space to [0,1] and the loss is logarithmic.  ...  We get around this challenge via an adaptive truncation approach, and also present a lower bound indicating that the truncation is, in some sense, necessary.  ...  Non-parametric binary regression has been employed in a number of works.  ... 
arXiv:2010.09886v1 fatcat:uobmz43yjbhtpm5dwapgb3dfsi

Metric Gaussian Variational Inference [article]

Jakob Knollmüller, Torsten A. Enßlin
2020 arXiv   pre-print
We alternate between approximating the covariance with the inverse Fisher information metric evaluated at an intermediate mean estimate and optimizing the KL-divergence for the given covariance with respect  ...  With this method we achieve higher accuracy and in many cases a significant speedup compared to traditional methods.  ...  Here we discuss the problem of binary Gaussian process classification in two dimensions with non-parametric kernel estimation. The data consists of binary values with associated location.  ... 
arXiv:1901.11033v3 fatcat:4xth43f4mzaanir4rwr5hufq2i

Distribution Calibration for Regression [article]

Hao Song, Tom Diethe, Meelis Kull, Peter Flach
2019 arXiv   pre-print
We are concerned with obtaining well-calibrated output distributions from regression models.  ...  We further propose a post-hoc approach to improving the predictions from previously trained regression models, using multi-output Gaussian Processes with a novel Beta link function.  ...  Isotonic calibration is a powerful non-parametric method based on isotonic regression along with a simple iterative algorithm called Pool Adjacent Violators (PAV), which finds the train-optimal regression  ... 
arXiv:1905.06023v1 fatcat:u3kqvmyinngf5dpbosqp3f7y3y

Projective Latent Interventions for Understanding and Fine-tuning Classifiers [article]

Andreas Hinterreiter and Marc Streit and Bernhard Kainz
2020 arXiv   pre-print
PLIs allow domain experts to control the latent decision space in an intuitive way in order to better match their expectations.  ...  The back-propagation is based on parametric approximations of t-distributed stochastic neighbourhood embeddings.  ...  In the actual training phase, we calculate low-dimensional pairwise probabilities q ij for each input batch, and use the KL-divergence KL(p ij ||q ij ) as a loss function.  ... 
arXiv:2006.12902v2 fatcat:cfxtxadjgvf57pjxclpb5x2yai

Dynamic Model Selection for Prediction Under a Budget [article]

Feng Nan, Venkatesh Saligrama
2017 arXiv   pre-print
We pose an empirical loss minimization problem with cost constraints to jointly train gating and prediction models.  ...  Then a low-complexity gating and prediction model are subsequently learnt to adaptively approximate the high-accuracy model in regions where low-cost models are capable of making highly accurate predictions  ...  Acknowledgments Feng Nan would like to thank Dr Ofer Dekel for ideas and discussions on resource constrained machine learning during an internship in Microsoft Research in summer 2016.  ... 
arXiv:1704.07505v1 fatcat:fd55lguvbrdc3jwg6jklznxzya

GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud [article]

Li Yi, Wang Zhao, He Wang, Minhyuk Sung, Leonidas Guibas
2018 arXiv   pre-print
Instead of treating object proposal as a direct bounding box regression problem, we take an analysis-by-synthesis strategy and generate proposals by reconstructing shapes from noisy observations in a scene  ...  The success of GSPN largely comes from its emphasis on geometric understandings during object proposal, which greatly reducing proposals with low objectness.  ...  Since we have parametrized q φ (z|x, c) and p θ (z|c) as N (µ z , σ 2 z ) and N (µ z , σ 2 z ) respectively through neural networks, the KL loss can be easily computed as: L KL = log σ z σ z + σ 2 z +  ... 
arXiv:1812.03320v1 fatcat:3ybr53c73zbxtk2wi2utb7sziu

Wasserstein regularization for sparse multi-task regression [article]

Hicham Janati and Marco Cuturi and Alexandre Gramfort
2019 arXiv   pre-print
We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space.  ...  In this paper, we propose a convex regularizer for multi-task regression that encodes a more flexible geometry.  ...  Our work is one of them in the context of sparse high dimensionial regression tasks where regressors can be associated to a geometric space.  ... 
arXiv:1805.07833v3 fatcat:yrgyudpnlrfufbgpcbwn2s4ki4

Distribution Matching in Variational Inference [article]

Mihaela Rosca, Balaji Lakshminarayanan, Shakir Mohamed
2019 arXiv   pre-print
In this paper, we expose the limitations of Variational Autoencoders (VAEs), which consistently fail to learn marginal distributions in both latent and visible spaces.  ...  With the increasingly widespread deployment of generative models, there is a mounting need for a deeper understanding of their behaviors and limitations.  ...  Leveraging binary classifiers to estimate KL divergences results in underestimated KL values, even when the discriminator is trained to optimality (see Figure 3 ).  ... 
arXiv:1802.06847v4 fatcat:j5mgvrxwsffxnpite5r33tunoe

Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning [article]

Uthaipon Tantipongpipat, Chris Waites, Digvijay Boob, Amaresh Ankit Siva, Rachel Cummings
2020 arXiv   pre-print
We implement this framework on both binary data (MIMIC-III) and mixed-type data (ADULT), and compare its performance with existing private algorithms on metrics in unsupervised settings.  ...  We also introduce a new quantitative metric able to detect diversity, or lack thereof, of synthetic data.  ...  For three continuous features in the ADULT dataset (capital gain, capital loss, and hours worked per week), we were not able to find a regression model with good fit (as measured by R 2 score) for the  ... 
arXiv:1912.03250v2 fatcat:lo6lugwudfgwjm6zze72bfrxcy

Federated Generalized Bayesian Learning via Distributed Stein Variational Gradient Descent [article]

Rahif Kassab, Osvaldo Simeone
2021 arXiv   pre-print
This paper introduces Distributed Stein Variational Gradient Descent (DSVGD), a non-parametric generalized Bayesian inference framework for federated learning.  ...  DSVGD is shown to compare favorably to benchmark frequentist and Bayesian federated learning strategies, also scheduling a single device per iteration, in terms of accuracy and scalability with respect  ...  Log-likelihood for Bayesian logistic regression with non-iid data distributions (N = 6, L = L = 200).  ... 
arXiv:2009.06419v6 fatcat:2gn7h22tfjfc5pm5wu6eqopkjq

Minimax Rates for Conditional Density Estimation via Empirical Entropy [article]

Blair Bilodeau, Dylan J. Foster, Daniel M. Roy
2021 arXiv   pre-print
for regression.  ...  For joint density estimation, minimax rates have been characterized for general density classes in terms of uniform (metric) entropy, a well-studied notion of statistical capacity.  ...  DMR is supported in part by an NSERC Discovery Grant and an Ontario Early Researcher Award. This material is based also upon work supported by the United States Air Force under Contract No.  ... 
arXiv:2109.10461v2 fatcat:dej5c5h3jjfkxjtu7wswsm6kdq

Deep Modeling of Growth Trajectories for Longitudinal Prediction of Missing Infant Cortical Surfaces [article]

Peirong Liu, Zhengwang Wu, Gang Li, Pew-Thian Yap, Dinggang Shen
2020 arXiv   pre-print
Adopting a binary flag in loss calculation to deal with missing data, we fully utilize all available cortical surfaces for training our deep learning model, without requiring a complete collection of longitudinal  ...  We will demonstrate with experimental results that our method is capable of capturing the nonlinearity of spatiotemporal cortical growth patterns and can predict cortical surfaces with improved accuracy  ...  [19] , where they proposed a learning-based framework for predicting dynamic postnatal changes in the cortical shape based on the cortical surfaces at birth using varifold metric for surface regression  ... 
arXiv:2009.02797v2 fatcat:xmjy3czkffelzf5ky3rt6anm5a

Adversarial Robustness via Fisher-Rao Regularization [article]

Marine Picot, Francisco Messina, Malik Boudiaf, Fabrice Labeau, Ismail Ben Ayed, Pablo Piantanida
2022 arXiv   pre-print
some interesting properties as well as connections with standard regularization metrics.  ...  Empirically, we evaluate the performance of various classifiers trained with the proposed loss on standard datasets, showing up to a simultaneous 1\% of improvement in terms of clean and robust performances  ...  ACKNOWLEDGMENT This work was supported by the Natural Sciences and Engineering Research Council of Canada, and McGill University in the framework of the NSERC/Hydro-Quebec Industrial Research Chair in  ... 
arXiv:2106.06685v2 fatcat:346ven472ncwtfygvmdupon4ae

Geometric Losses for Distributional Learning [article]

Arthur Mensch, Mathieu Blondel, Gabriel Peyré (DMA, CNRS)
2019 arXiv   pre-print
Unlike previous attempts to use optimal transport distances for learning, our loss results in unconstrained convex objective functions, supports infinite (or very large) class spaces, and naturally defines  ...  a metric or cost between classes.  ...  We set the KL weight to 1, and rescale the KL loss with a factor h × w, to make its gradient of the same order as the one computed with separated binary cross entropy.  ... 
arXiv:1905.06005v1 fatcat:bp4o56snqnfkfol5lywlfvp3fy

DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding [article]

Dieu Linh Tran, Robert Walecki, Ognjen Rudovic, Stefanos Eleftheriadis, Bjørn Schuller, Maja Pantic
2017 arXiv   pre-print
By contrast, the non-parametric (probabilistic) approaches, such as Gaussian Processes (GPs), typically outperform their parametric counterparts, but cannot deal easily with large amounts of data.  ...  To this end, we propose a novel VAE semi-parametric modeling framework, named DeepCoder, which combines the modeling power of parametric (convolutional) and nonparametric (ordinal GPs) VAEs, for joint  ...  using non-parametric models.  ... 
arXiv:1704.02206v2 fatcat:fhqmrhkwz5evdglag26kx4dkgq
« Previous Showing results 1 — 15 out of 2,341 results