AI Giving Back to Statistics? Discovery of the Coordinate System of Univariate Distributions by Beta Variational Autoencoder [article]

Alex Glushkovsky
2020 arXiv   pre-print
Distributions are fundamental statistical elements that play essential theoretical and practical roles. The article discusses experiences of training neural networks to classify univariate empirical distributions and to represent them on the two-dimensional latent space forcing disentanglement based on the inputs of cumulative distribution functions (CDF). The latent space representation has been performed using an unsupervised beta variational autoencoder (beta-VAE). It separates distributions
more » ... of different shapes while overlapping similar ones and empirically realises relationships between distributions that are known theoretically. The synthetic experiment of generated univariate continuous and discrete (Bernoulli) distributions with varying sample sizes and parameters has been performed to support the study. The representation on the latent two-dimensional coordinate system can be seen as an additional metadata of the real-world data that disentangles important distribution characteristics, such as shape of the CDF, classification probabilities of underlying theoretical distributions and their parameters, information entropy, and skewness. Entropy changes, providing an "arrow of time", determine dynamic trajectories along representations of distributions on the latent space. In addition, post beta-VAE unsupervised segmentation of the latent space based on weight-of-evidence (WOE) of posterior versus standard isotopic two-dimensional normal densities has been applied detecting the presence of assignable causes that distinguish exceptional CDF inputs.
arXiv:2004.02687v1 fatcat:s3xdmvcp3bcilcgntwvrgn6irm