Consensus Genetic Maps as Median Orders from Inconsistent Sources
IEEE/ACM Transactions on Computational Biology & Bioinformatics
A genetic map is an ordering of genetic markers calculated from a population of known lineage. Although, traditionally, a map has been generated from a single population for each species, recently, researchers have created maps from multiple populations. In the face of these new data, we address the need to find a consensus map-a map that combines the information from multiple partial and possibly inconsistent input maps. We model each input map as a partial order and formulate the consensus
... blem as finding a median partial order. Finding the median of multiple total orders (preferences or rankings) is a well-studied problem in social choice. We choose to find the median by using the weighted symmetric difference distance, which is a more general version of both the symmetric difference distance and the Kemeny distance. Finding a median order using this distance is NP-hard. We show that, for our chosen weight assignment, a median order satisfies the positive responsiveness, extended Condorcet, and unanimity criteria. Our solution involves finding the maximum acyclic subgraph of a weighted directed graph. We present a method that dynamically switches between an exact branch and bound algorithm and a heuristic algorithm and show that, for real data from closely related organisms, an exact median can often be found. We present experimental results by using seven populations of the crop plant Zea mays.