Probabilistic Decomposition in Machine Learning Problems [article]

Byoungwook Jang, University, My
2022
Decomposition models for understanding mean and covariance structures from high-dimensional data have attracted a lot of attention in recent years. This thesis visits selected machine learning problems with applications in topic modeling, neuroimaging, and experimental designs and tackles challenges in these applications by incorporating decomposable structures. The first part of the thesis looks into the statistical learning problems for the applications with decomposable mean structures,
more » ... y topic modeling and multi-spectral imaging. The goal of topic modeling and multi-spectral unmixing is to decompose the spectrum for each document (or pixel) in the corpus (or the image of a scene) to find latent topics (or spectra of materials present in multi-spectral images). In topic modeling applications, the number of latent variables is a lot less than the ambient dimension. This allows us to estimate the topic simplex with the geometric approach by minimizing the volume of the topic polytope. In our second application, we aim to trace neurons present in multi-spectral images, called Brainbow images, which capture individual neurons in the brain and allow researchers to distinguish different neurons based on unique combinations of fluorescent colors. Brainbow images, however, have an over-defined problem as the number of unique neuron color combinations is greater than the number of spectral channels. Thus, we reformulate the neuron tracing problem as a hidden Markov model with underlying neuronal processes as latent variables to decompose the observed Brainbow images into individual neurons. The second part of the thesis studies the decomposition of covariance models for tensor-variate data to introduce a scalable and interpretable structure. In the tensor-variate analysis, the observed data often exhibit spatio-temporal structure, and it is desirable to simultaneously learn partial correlation for each mode of the tensor data. However, estimating the unstructured covariance model for tensor-variate data scales qua [...]
doi:10.7302/5915 fatcat:cl22nxb4r5a75lt24t6oni45lq