Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets [article]

Aleksandar Stojmirovic
2008 arXiv   pre-print
A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. Chapter 2 presents basic concepts of the theory of quasi-metric spaces and introduces a new examples of them: the universal countable rational
more » ... space and its bicompletion, the universal bicomplete separable quasi-metric space. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The main result of this chapter indicates that 'a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics. The results about pq-spaces are used to produce novel theoretical bounds on performance of indexing schemes. Finally, the thesis presents some biological applications. Chapter 6 introduces FSIndex, an indexing scheme that significantly accelerates similarity searches of short protein fragment datasets. Chapter 7 presents the prototype of the system for discovery of short functional protein motifs called PFMFind, which relies on FSIndex for similarity searches.
arXiv:0810.5407v1 fatcat:k2mb4lcdebepzid4wtjvf24t2i