Metrics for Clustering Comparison in Bioinformatics

Giovanni Rossi
2016 Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods  
Developing from a concern in bioinformatics, this work analyses alternative metrics between partitions. From both theoretical and applicative perspectives, a useful and interesting distance between any two partitions is HD, which counts the number of atoms finer than either one but not both. While faithfully reproducing the traditional Hamming distance between subsets, HD is very sensible and computable through scalar products between Boolean vectors. It properly deals with complements and
more » ... atically resembles the entropy-based variation of information VI distance. Entire families of metrics (including HD and VI) obtain as minimal paths in the weighted graph given by the Hasse diagram: submodular weighting functions yield path-based distances visiting the join (of any two partitions), whereas supermodularity leads to visit the meet. This yields an exact (rather than heuristic) approach to the consensus partition (combinatorial optimization) problem.
doi:10.5220/0005707102990308 dblp:conf/icpram/Rossi16a fatcat:6oxbxnrkhjf4nivlewzbxn7ava