Change-Point Detection in Binary Markov DNA Sequences by Cross-Entropy Method

Tatiana Polushina, Georgy Sofronov
2014 Proceedings of the 2014 Federated Conference on Computer Science and Information Systems  
A deoxyribonucleic acid (DNA) sequence can be represented as a sequence with 4 characters. If a particular property of the DNA is studied, for example, GC content, then it is possible to consider a binary sequence. In many cases, if the probabilistic properties of a segment differ from the neighbouring ones, this means that the segment can play a structural role. Therefore, DNA segmentation is given a special attention, and it is one of the most significant applications of change-point
more » ... ange-point detection. Problems of this type also arise in a wide variety of areas, for example, seismology, industry (e.g., fault detection), biomedical signal processing, financial mathematics, speech and image processing. In this study, we have developed a Cross-Entropy algorithm for identifying change-points in binary sequences with first-order Markov dependence. We propose a statistical model for this problem and show effectiveness of our algorithm for synthetic and real datasets.
doi:10.15439/2014f88 dblp:conf/fedcsis/PolushinaS14 fatcat:7uzd7sowyfb6jgkeams7bxc6eu