Joint NMF for Identification of Shared Features in Datasets and a Dataset Distance Measure [article]

Hannah Friedman, Amani R. Maina-Kilaas, Julianna Schalkwyk, Hina Ahmed, Jamie Haddock
2022 arXiv   pre-print
In this paper, we derive a new method for determining shared features of datasets by employing joint non-negative matrix factorization and analyzing the resulting factorizations. Our approach uses the joint factorization of two dataset matrices X_1,X_2 into non-negative matrices X_1 = AS_1, X_2 = AS_2 to derive a similarity measure that determines how well a shared basis for X_1, X_2 approximates each dataset. We also propose a dataset distance measure built upon this method and the learned
more » ... orization. Our method is able to successfully identity differences in structure in both image and text datasets. Potential applications include classification, detecting plagiarism or other manipulation, and learning relationships between data sets.
arXiv:2207.05112v1 fatcat:lby6oztm2jhcdppxxmn6b77pge