36 Hits in 0.95 sec

On Maximum Entropy and Inference

Luigi Gresele, Matteo Marsili
2017 Entropy  
Maximum Entropy is a powerful concept that entails a sharp separation between relevant and irrelevant variables. It is typically invoked in inference, once an assumption is made on what the relevant variables are, in order to estimate a model from data, that affords predictions on all other (dependent) variables. Conversely, maximum entropy can be invoked to retrieve the relevant variables (sufficient statistics) directly from the data, once a model is identified by Bayesian model selection. We
more » ... explore this approach in the case of spin models with interactions of arbitrary order, and we discuss how relevant interactions can be inferred. In this perspective, the dimensionality of the inference problem is not set by the number of parameters in the model, but by the frequency distribution of the data. We illustrate the method showing its ability to recover the correct model in a few prototype cases and discuss its application on a real dataset.
doi:10.3390/e19120642 fatcat:5m2r4qu4tvhsndermujeqnt7xe

Embrace the Gap: VAEs Perform Independent Mechanism Analysis [article]

Patrik Reizinger, Luigi Gresele, Jack Brady, Julius von Kügelgen, Dominik Zietlow, Bernhard Schölkopf, Georg Martius, Wieland Brendel, Michel Besserve
2022 arXiv   pre-print, 2020. 18, 35 [18] Luigi Gresele, Paul K. Rubenstein, Arash Mehrjou, Francesco Locatello, and Bernhard Schölkopf.  ...  URL 2, 4 [19] Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, and Aapo Hyvärinen.  ... 
arXiv:2206.02416v1 fatcat:xzszp3m4dja3zcrcjl6hngo7iy

Privacy-Preserving Causal Inference via Inverse Probability Weighting [article]

Si Kai Lee, Luigi Gresele, Mijung Park, Krikamol Muandet
2019 arXiv   pre-print
The use of inverse probability weighting (IPW) methods to estimate the causal effect of treatments from observational studies is widespread in econometrics, medicine and social sciences. Although these studies often involve sensitive information, thus far there has been no work on privacy-preserving IPW methods. We address this by providing a novel framework for privacy-preserving IPW (PP-IPW) methods. We include a theoretical analysis of the effects of our proposed privatisation procedure on
more » ... e estimated average treatment effect, and evaluate our PP-IPW framework on synthetic, semi-synthetic and real datasets. The empirical results are consistent with our theoretical findings.
arXiv:1905.12592v2 fatcat:moiv7skaubh3vknbhxpsmmwa2y

Independent mechanism analysis, a new concept? [article]

Luigi Gresele, Julius von Kügelgen, Vincent Stimper, Bernhard Schölkopf, Michel Besserve
2022 arXiv   pre-print
Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are
more » ... luded in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation.
arXiv:2106.05200v3 fatcat:rqeuoxwpuzdvreaay7q46af73e

Learning explanations that are hard to vary [article]

Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto, Luigi Gresele, Bernhard Schölkopf
2020 arXiv   pre-print
In this paper, we investigate the principle that 'good explanations are hard to vary' in the context of deep learning. We show that averaging gradients across examples -- akin to a logical OR of patterns -- can favor memorization and 'patchwork' solutions that sew together different strategies, instead of identifying invariances. To inspect this, we first formalize a notion of consistency for minima of the loss surface, which measures to what extent a minimum appears only when examples are
more » ... d. We then propose and experimentally validate a simple alternative algorithm based on a logical AND, that focuses on invariances and prevents memorization in a set of real-world tasks. Finally, using a synthetic dataset with a clear distinction between invariant and spurious mechanisms, we dissect learning signals and compare this approach to well-established regularizers.
arXiv:2009.00329v3 fatcat:hexhmtq57zbmfnjjcqtxzt6evu

On Pitfalls of Identifiability in Unsupervised Learning. A Note on: "Desiderata for Representation Learning: A Causal Perspective" [article]

Shubhangi Ghosh, Luigi Gresele, Julius von Kügelgen, Michel Besserve, Bernhard Schölkopf
2022 arXiv   pre-print
Model identifiability is a desirable property in the context of unsupervised representation learning. In absence thereof, different models may be observationally indistinguishable while yielding representations that are nontrivially related to one another, thus making the recovery of a ground truth generative model fundamentally impossible, as often shown through suitably constructed counterexamples. In this note, we discuss one such construction, illustrating a potential failure case of an
more » ... tifiability result presented in "Desiderata for Representation Learning: A Causal Perspective" by Wang & Jordan (2021). The construction is based on the theory of nonlinear independent component analysis. We comment on implications of this and other counterexamples for identifiable representation learning.
arXiv:2202.06844v1 fatcat:n5mlyuzxtzdghefpxkqzrjwxc4

Orthogonal Structure Search for Efficient Causal Discovery from Observational Data [article]

Anant Raj and Luigi Gresele and Michel Besserve and Bernhard Schölkopf and Stefan Bauer
2020 arXiv   pre-print
The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properties of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the number of the explanatory variables and are difficult to extend to nonlinear relationships.
more » ... to existing work, we propose an approach which even works for observational data alone, while still offering theoretical guarantees including the case of partially nonlinear relationships. Our algorithm requires only one estimation for each variable and in our experiments we apply our causal discovery algorithm even to large graphs, demonstrating significant improvements compared to well established approaches.
arXiv:1903.02456v2 fatcat:3dulhzoqtbbkncjmewcdwjul24

Relative gradient optimization of the Jacobian term in unsupervised deep learning [article]

Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, Aapo Hyvärinen
2020 arXiv   pre-print
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their maximum likelihood based training requires
more » ... estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian, without imposing constraints on its structure, in stark contrast to autoregressive normalizing flows.
arXiv:2006.15090v2 fatcat:jl537lnhdjbhbk3nt5vpd47bru

Causal Inference Through the Structural Causal Marginal Problem [article]

Luigi Gresele, Julius von Kügelgen, Jonas M. Kübler, Elke Kirschbaum, Bernhard Schölkopf, Dominik Janzing
2022 arXiv   pre-print
We introduce an approach to counterfactual inference based on merging information from multiple datasets. We consider a causal reformulation of the statistical marginal problem: given a collection of marginal structural causal models (SCMs) over distinct but overlapping sets of variables, determine the set of joint SCMs that are counterfactually consistent with the marginal ones. We formalise this approach for categorical SCMs using the response function formulation and show that it reduces the
more » ... space of allowed marginal and joint SCMs. Our work thus highlights a new mode of falsifiability through additional variables, in contrast to the statistical one via additional data.
arXiv:2202.01300v3 fatcat:ng4zw7zrlfa2bpl6kcfikqes7q

The Incomplete Rosetta Stone Problem: Identifiability Results for Multi-View Nonlinear ICA [article]

Luigi Gresele, Paul K. Rubenstein, Arash Mehrjou, Francesco Locatello, Bernhard Schölkopf
2019 arXiv   pre-print
We consider the problem of recovering a common latent source with independent components from multiple views. This applies to settings in which a variable is measured with multiple experimental modalities, and where the goal is to synthesize the disparate measurements into a single unified representation. We consider the case that the observed views are a nonlinear mixing of component-wise corruptions of the sources. When the views are considered separately, this reduces to nonlinear
more » ... Component Analysis (ICA) for which it is provably impossible to undo the mixing. We present novel identifiability proofs that this is possible when the multiple views are considered jointly, showing that the mixing can theoretically be undone using function approximators such as deep neural networks. In contrast to known identifiability results for nonlinear ICA, we prove that independent latent sources with arbitrary mixing can be recovered as long as multiple, sufficiently different noisy views are available.
arXiv:1905.06642v2 fatcat:mlis7vcwkndrplu5tzh2vxm7fa

Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [article]

Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin
2020 arXiv   pre-print
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant conditions such as movie watching, where the experimental setup does not imply well-defined cognitive
more » ... ations. We propose a novel MultiView Independent Component Analysis (ICA) model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. Contrary to most group-ICA procedures, the likelihood of the model is available in closed form. We develop an alternate quasi-Newton method for maximizing the likelihood, which is robust and converges quickly. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects. Moreover, the sources recovered by our model exhibit lower between-session variability than other methods.On magnetoencephalography (MEG) data, our method yields more accurate source localization on phantom data. Applied on 200 subjects from the Cam-CAN dataset it reveals a clear sequence of evoked activity in sensor and source space. The code is freely available at
arXiv:2006.06635v4 fatcat:nx4oktnnargetj3i2vdakbcgki

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [article]

Julius von Kügelgen, Yash Sharma, Luigi Gresele, Wieland Brendel, Bernhard Schölkopf, Michel Besserve, Francesco Locatello
2022 arXiv   pre-print
Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant
more » ... augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice.
arXiv:2106.04619v4 fatcat:hv7lcnafzregngqaks2rrbfxpm

Next‐generation sequencing for the diagnosis of MYH9 ‐RD: predicting pathogenic variants

Loredana Bury, Karyn Megy, Jonathan C Stephens, Luigi Grassi, Daniel Greene, Nick Gleadall, Karina Althaus, David Allsup, Tadbir K Bariana, Mariana Bonduel, Nora V Butta, Peter Collins (+25 others)
2019 Human Mutation  
Some cases of somatic or germinal mosaicism have also been described (Gresele et al., 2013; Kunishima et al., 2005; Kunishima, Takaki, Ito, & Saito, 2009 ).  ... 
doi:10.1002/humu.23927 pmid:31562665 pmcid:PMC6972977 fatcat:3teojeyqrrghpdt5x2sptx3d7u

Page 383 of Acta Apostolicae Sedis Vol. 8, Issue 10 [page]

1916 Acta Apostolicae Sedis  
Luigi Bersani.  ...  Emidio Gresele, della diocesi di Ascoli Piceno. 14 settembre. — Mons. Domenico Marena, della diocesi di Sant’ Angelo de’ Lombardi e Bisaccia. Camerieri Segreti di Spada e Cappa soprannumerari di S.  ... 

Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects

Julius Von Kugelgen, Luigi Gresele, Bernhard Scholkopf, Apollo-University Of Cambridge Repository
We point out an instantiation of Simpson's paradox in COVID-19 case fatality rates (cfrs): comparing a large-scale study from China (February 17) with early reports from Italy (March 9), we find that cfrs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify
more » ... ent direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and case fatality. We curate an age-stratified cfr dataset with [Formula: see text]750 k cases and conduct a case study, investigating total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age and facilitates a more transparent comparison of cfrs across countries at different stages of the COVID-19 pandemic. Using longitudinal data from Italy, we discover a sign reversal of the direct causal effect in mid-March, which temporally aligns with the reported collapse of the healthcare system in parts of the country. Moreover, we find that direct and indirect effects across 132 pairs of countries are only weakly correlated, suggesting that a country's policy and case demographic may be largely unrelated. We point out limitations and extensions for future work, and finally, discuss the role of causal reasoning in the broader context of using AI to combat the COVID-19 pandemic. Impact Statement-During a global pandemic, understanding the causal effects of risk factors such as age on COVID-19 fatality is an important scientific question. Since randomised controlled trials are typically infeasible or unethical in this context, causal investigations based on observational data-such as the one carried out in this article-will, therefore, be crucial in guiding our understanding of the available data. Ca [...]
doi:10.17863/cam.83153 fatcat:264mhs4jufgdxpe4jztvveyto4
« Previous Showing results 1 — 15 out of 36 results