Geometry of Similarity Comparisons [article]

Puoya Tabaghi, Jianhao Peng, Olgica Milenkovic, Ivan Dokmanić
2021 arXiv   pre-print
Many data analysis problems can be cast as distance geometry problems in space forms – Euclidean, spherical, or hyperbolic spaces. Often, absolute distance measurements are often unreliable or simply unavailable and only proxies to absolute distances in the form of similarities are available. Hence we ask the following: Given only comparisons of similarities amongst a set of entities, what can be said about the geometry of the underlying space form? To study this question, we introduce the
more » ... ns of the ordinal capacity of a target space form and ordinal spread of the similarity measurements. The latter is an indicator of complex patterns in the measurements, while the former quantifies the capacity of a space form to accommodate a set of measurements with a specific ordinal spread profile. We prove that the ordinal capacity of a space form is related to its dimension and the sign of its curvature. This leads to a lower bound on the Euclidean and spherical embedding dimension of what we term similarity graphs. More importantly, we show that the statistical behavior of the ordinal spread random variables defined on a similarity graph can be used to identify its underlying space form. We support our theoretical claims with experiments on weighted trees, single-cell RNA expression data and spherical cartographic measurements.
arXiv:2006.09858v4 fatcat:ph7qemzarnagpi662sbawljxfq