On the duplication distance of binary strings

Noga Alon, Jehoshua Bruck, Farzad Farnoud, Siddharth Jain
2016 2016 IEEE International Symposium on Information Theory (ISIT)  
We study the tandem duplication distance between binary sequences and their roots. This distance is motivated by genomic tandem duplication mutations and counts the smallest number of tandem duplication events that are required to take one sequence to another. We consider both exact and approximate tandem duplications, the latter leading to a combined duplication/Hamming distance. The paper focuses on the maximum value of the duplication distance to the root. For exact duplication, denoting the
more » ... ation, denoting the maximum distance to the root of a sequence of length n by f (n), we prove that f (n) = Θ(n). For the case of approximate duplication, where a β-fraction of symbols may be duplicated incorrectly, we show using the Plotkin bound that the maximum distance has a sharp transition from linear to logarithmic in n at β = 1/2. 1 Note that using the term distance here is a slight abuse of notation as the duplication distance does not satisfy the triangle inequality.
doi:10.1109/isit.2016.7541301 dblp:conf/isit/AlonBFJ16 fatcat:yrxo3dseqnfz5l2el5s5ey7bt4